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CENERATIOM AUD SELECTION OF NOVi:L BtNOTNC PRaFEINS 

GACKCUoawn of the invention 

r iold" of 'r'.o Tnvontion 

This invention relates to dcvo lc?;nent ' o C novel 
binding proteins by an iterative process of 
nutagenesis, expression, chromatographic selection, and 
anplif ication. 

Infornatiop- Disc los'jro Statcrr.Qnt 

.The anino acid sequence of a protein deter-inos 
its three-diriensional (3D) Gtructare, which in turn 
dotornir.es protein ;ur.c;t i on ing (iPGT61/ ANFIT:;. A 
widely accepted systerr. oc classifying protein structure 
r.iv be found in Sc^u!.: and Scnirr.er {SCHU79, ChD) . 
Ir.eir classification syscen is a-iooted herein. 

Shortle fr.MOR35) , Sauer and colleagues (PAKU56, 
REID83). and Carutr.ers and coUcagues (EISE3b) have 
shovn that sor.e residues on the pciypeptido chain ari 
.core ir.portar.t thnr. o-hcrs in doter-mirq the 2^. 
structure of a protein. The ID structure is 
essentially unaffected by the identity of the a.-ir.o 
acids at sore loci; at other loci only one or a fev 
types of a-inc acid is allowed. In nost cases, loci 
-.here vide variety is allowed have the anino acid sico 
group directed toward the solvent. Loci --here liaited 
variety is allowed frequently have the si.:e group 
directed toward other parts of the protein. Thus 
substitutions of anino acids th.it arc rxpcsod t- 
solvent are less likely to affect the 3D structure than 
are substitutions at internal loci. (Cec also SCnU79, 
p:S9-l71 and CFXIS-l, p23y-:i5» 3 I'. - J 15) . 
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The secondary structure (helices, sheets, turns, 
loops) of a protein is detemined mostly by local 
sequence. Certain anino acids tend to be correlated 
5 with certain secondary structures and the corrj.only used 
Chcu-ras:?an (CK0U7*:, CH0L'73a, CHCU78b) rules depend on 
these correlations. The correlations botveen anino- 
acid type and secondary structure are not, however, 
absolute, and every anino acid type has been observed 

10 in helices and in both parallel and antiparallel 
sheets. Kabsch and Sander (KABS84) report on 
pentapeptides of identical sequence found in different 
proteins; in seme cases the conforr.ations of the 
pentapeptides are very different. Argos (ARCC37) 

15 surveyed pentapeptides of similar scquerxe in different 
proteins and found that the structures of che scquence- 
si.nilar subsequences were frequently differonc. 

The residues that join helices to helir.es, helices 
20 CO sheets, and sheets to sheets are ciliod turns and 
loops and have rocercly been classified by Richaidscn 
(?.IC:i31), Thornton (THCRSS) , Sutcliffe ^ aj^ (SUTCST^) 
and others. Insert ic.-.s and deletions are .T.ore readily 
tolerated in Iccps th-in elsewhere. Thornron et aj^ 
25 '(TKCR33) have su.T.r3;a rized .T.any cbservat ior.s indicating 
that related proteins usually differ rsost at the Iccps 
which join the r.cre rogul^r olemcncs oc seccnda.-y 
structure . 

30 When the cimiro acid sequence of one prccoin has 

been changed to be core like the sequence of a second 
protein, the properties of the novel protein usually 
approach the properties of the second protein. Wells 
et (WELLS7a) reported that changing three re-.iducs 

35 in subtil is in fron !.'-*.c i I : \i s arv 1 o I I c ue f/^c ions to be the 
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sane as t^o corresponding rosidues in subtilisin fro.-a 
B . I ic^c•n■•. :or- 1 5 produced a protease that had nearly 
the sar.e activity as the subtilisin tron the latter 
orqanis=i. There were 82 diffcrenjes renaininq in the 
5 sequences. The three residues changed were chosen 
because tr.ey verc the only differences within 7 
Angccrcns (A*, of the active site. 

Many proteins bind non-cova lent ly but very tightly 
10 and specifically to somft other characteristic 
Dolecules. SchuXr. and Schirner sunnarizo many 
observations on the binding of proteins to other 
proteins (SCHUTO, pS3-I05) . For example, haecoglcbin 
alpha chains bind very tightly to haemoglobin beta 
IS 'Chains (delta C less than -11.0 Kcal/nolo) ; antibodies 
bind tightly to antigens (K^s range froirt 10"** to lO"-"' 
X, is the dissociation constant equal to 

; A 1 r 3 ] / ( A : 3 i J ; basic bovine pancreatic trypsin 
inhibitor (c?Tr) binds tightly to trypsin (K.^ 6.C >: 
20 10"'^-^ H (TSCi:37) , delta G = -IS.O Kca l/nole) ; ar.i 
avidin binds to biotin (Kd =^ 1 . 3 x 10'^^ M (CRi:34, 
p362}). 

In each case Che binciing results frcr. 

25 cc-pic.nentar i ty c: the surfaces that cone into contact: 
bur.ps fit into holes, unlike charges corne together, 
dipoles aliw^n, and- hydrophobic atons contact ether 
hydrophobic atons. Although bulVc water is excluded, 
individual vater molecules are frequently found filling 

30 space in i nt 2rno lecu la r inter f-tces ; these waters 
usually fcrs hydrogen bonds to one or core ators of the 
protein or to other bound water. Thus proteins founj 
in nature have not attained, nor do they require 
perfect ccnp 1 e.- e n ta r i ty to bind tightly and 

35 specifically to their substrates. Only in rare cases 




is crierc os.'tciit ial ly pcrfecc co.T.p 1 e.-ncnta ri ty ; then c.he \. 
binding is e>:::*.-cr.ely cicjhc (as (or exar.ple, avid-r. I^; 
binding to biotin) . fi-: 

K 

^- 

The relacive ;-port3nce of electrostatic vs. 
hydrophobic interactions is not . fully unJorr.tocci ^. 
(3CHU7V. plOS) . Attroction betvecn oppositely charcoJ f!; 
groups apparently contributes little to the free-enerc/ t 
of binding between proteins and other noieC'jlcs = LiV.e- 
charged groups can, however, increase specif icity: 
repulsion of like-charged groups in the bindir.^ ^- 
interf-ce or even unpaired charges in the interface can p 
greatly reduce or eliminate binding in ir.stanccs whnre 
shape and hydrophobic interactions would other- ise 
induce it. ^* 

rt has been cbser'/ed, hove'^er, that proteins cat. \ 

bind to other noioculcr such tha t ' 1 ike-cr.a rged qrcj-o [= ' 

... . >' 

arc ju>:t ape sea; m sucn instances repulsion iz r^duvrc \ 

or cli.T.inated by inclusion of oppositely charged icr.s 

in the binding interface. An ex.i-ple .of this [■ • 

phenosienon is the inclusion c: tvo positively charced ^' 

calcium ions between e.^^ch pair of subunits cf turr.:t j"' 

crinkle virus (HOCL33) . The subunits each contain t-o ^. 

negatively charged D ( s ing ic- 1 otter dnino acid c-jces I- 

are given in Table 1) and l residu'is in close ; 

proximity. 

The factors affecting prott?in binding are kncvn. I 

{CMOT75, CHOT76, SCH*J79. p'J8-lC7, .ind CRE:34, ChQ) , but | 

designing new cor.p 1 c-er. t a ry surfaces has prove: L 

difficult. Although so.7,e rules have been developed for |7 

substituting side groups (GL*:C37bJ , the side groups cf V 

proteins are floppy and it is dirficult to predict what { 

conforr.a t ion a new side group will take. Further, the [ . 
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forces -hJC bind proteins to other noleciilcs arc oil 
relatively weak and it is difficult to predict the 
effects of these forces. 

5 Recently, Quiocho and collaborators (QUIoa?) 

elucidated the structures of several pnriplasmic 
binding proteins fron Crani-ncg.i t ivo bacteria. They 
fcund tnat the proteins, despite having low sequence 
hon;ology and differences in structural detail, have 

10 certain important similarities. Each of the proteins 
they investigated is corrposed of two domains that are 
joined by three strands of protein. The binding site 
is located between the two doaains and is isolated frc.-n 
bulJc solvent. The structure of the binding site is 

15 dense and highly ordered, and binding constants are 
very high. The researchers suggest that binding of 
ligands causes a ccn f omaticnal change that alters the 
relative positions of the two donain^. 

20 The researchers found that each of the pcriplasr.ic 

binding proteins has nursercus residues (seven or mere) , 
arrayed about the binding site. Surprisingly, ionic 
ligands are not bound by ionic side groups oc opposite 
charge, but by nain-chain cor.ponents. Electrical 

25 ' charge seens to be neutralized by dicole interactions. 
Further, hydrophobic contacts play an inportant role in 
binding . 

Based on their investigations of these binding 
30 proteins, Quiocho et a 1 . sug'^est it is unliV:ely that, 
using current protein engineering nethods, protnins can 
be constructed with binding properties superior to 
those of proteins that occur naturally. 
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Wilkinson a I . (Wiu;3 4) have Jcund, hovever. r 

that enzyne-subscra te affinity nay be increased by j^: 
protein engineering. They reported that a mutant of E 
tyrosyl tRTJA synthetase of B<^ci 1 lus st ea rctho rncnh i lus f" 
that has proline at residua 51 exhibits a 100-fold j:"* 
increase in affinity for ATI*. 

i. ; 

r- 



Substitution of one aciino acid for another at a 
surface locus may profoundly alter b.:.n'Jing properties 

10 o Jf the protein other than cubs t rate binding *••■■ thout 

affecting the tertiary structure of the protein. For Ij 

example, in sickle-cell hae.-:ioglobin the change of the ^ 

surface residue E6 to V in the beta chains causes V 

deoxyhaemoglobin-S to form fibers through self binding 1. 

15 (DICKS J, P125-145) . Love a.nd others have shcvn that 

the tertiary and quaternary structure of the ^. 

haemoglobin are not changed fPADLSS, wiSH75, WI3H76) . t 

Tan and Kaiser (TA:jK77) and Tschesche ot a_K. % 
20 (TSCK37) showed thac changing a single a=ino acid in t' 
BPTI greatly rodcces its binding to tr^/psin, but that y 
Gone of the new molecules retain the parental | 
characteristics of binding to a.id inhibiting - ^' 
^ chynotrypsin, while others exhibit ncv binding to t 
25 elastase. Caruthcrs and others (ZlSZZz) h>.vo snown P 
that changes of single amino acids on the surface of I 
the lambda Cro repressor greatly reduce its affinity 
for the natural operator Or3, but greatly increase the 
binding of the mutant protein to a mutant operator. \ 
30 Thus changing the surface of a binding protein aay j.* 
alter its specificity without abolishing binding In- 
activity, j- 

l 

r- 

The recently developed techniques ot "reverse y 

35 genetics" have been used to produce single specific f' 



nutations at precir,e base p.ir loci (CLIP36. OLIPS?. 
ar.d AL-:iU87). Nutations are ocnorjUy detected by 
sequencing and in soac cases by loss of wild-type 
function. These procedures allov researchers to 
analyze the function of each rosidu- in a protein 
(MCLL83) or Of each base in a rcqulatory D:JA 

sequence (CHEUSa). In those analysc:S, the norm has 
been t:^ strive for the classical qoal of obcaining 
tants carrying a single alteration (AV3V37). 
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Reverse genetics is frcTiently ^ipplied to coding 
regions to doterninc which residues arc rest inportar.t 
to the protein structure zr.-± fur.ction. Ir. such 
studies, isolation of a sir.iie -j-ant dt each resiclue 
o£ the protein gives an initial estimate of vhich 
residues play crucial roles. 

Prior to the method of the present invention, tvo 
general approaches have beer, dovoicpcd to create nuvel 
r.utant proteins through reverse genetics. Both r.et!'.oc3 
start with a clone of the g-r.e o: interest. Ir. cr.e 
approach, dubbed, "protein surqery" (revieved by Dill. 
(DILL37)). a specific substitution is introduced at a 
single protein residue by a synthetic -cth-id uriir.7 rhe 
corresponding natural or sy.-.ihecic clcr.rd gene. CraiV, 
'et aj^ (CfoviaS), Roa et (?-0L>e7). ar.i Bash ot ,iU 

(BASH37) have used this approach to detOL-niine the 
effects on structure and function of sp^cir'-.c 
substitutions in trypsin. 

The other approach has toon to generate a variety 
of r^utants at nany loci within the clcnod gene, the 
"gene-directed random r.u t ace nos is" r.ethod. The 
specific location and nature ot the change arr 
detcrnincd by DMA scqucncinc. It nay be possible to 
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screen for mutations if loss of a vUd-type function 
confers a cellular phenotype. Using 
immunoprecipitat ion , on^ can th«;n differentiate ar.ong 
mutant proteins tJiAt: a) fold out fail to funcrion. b) 
fail to fold but persist, and c) arc degraded, perhaps 
due to failure to fold. This approach is cxcriplified 
by the work of ?akula iit al . (PAKX'Sfi) cn the etfect of 
point mutations on the structure and function of the 
Cro protein frorr. bacteriophage lambda. This approach 
is limited by the number of colonies that can bo 
examined. An additional Important linitaxion is Cr/dt; 
many desirabl . rotein alterationis require multiple 
amino acid substitutions and thus are not accessible 
through single base changes or even through all 
possible amino acid substitutions at any one residue. 
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The objective in both the surgical and acre- 
directed random mutagenesis approac.'".cs has been, 
hovever, tc analyze the effects of a variety of single 
substitution mutations, so that rules governing such 
subs t it>:t 1 .Tns could be developed (L'LMESJ). Progress 
has been greatly hampered by the extensive efforts- 
involved in using either method and the practical 
limitations on the number of colonies that can be 
inspected (RO[iE36) . 
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The term "saturation mutagenesis" vitri reference 
to synthetic DMA is generally taken to ir.can generation 
of a population in which: a) every possible singLc-base 

30 change within a fragr-enC of a gene of DMA regulatory 
region is represented, and b; most mutant genes contain 
only one mutation. Thus a set of all possible single 
mutations for a 6 base pair length of DNA comprises a 
population of 10 mutants. Oliphunt r>t a I . {OLIP36) and 

35 Oliphant and CtTuhl {OLIP37) have demonstrated ligation 
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itnd cloning of r.i^nly deqeneraCc o 1 igon'-c icot: Ides and 
n-ive applied socuration muCrtgencsis the study ot* 

prorTioter sequence and fOnction. They have suggested 
that sinilar ncthods could be used to study genetic 
expression of protein coding regions of genes, but they 
do not say how one should: a) choose protein residues 
to vary/ or b) select or screen nutants with desirable 
• properties. 

Reidhaar-Olscn and Sauer (REIOSai ^ave used 
synthetic degenerate oligo-nts to vary s ir.u I taneous ly 
two or three residues through all t-enty anir.o acids in 
the dizer interface of cl repressor fron bacteriophage 
lambda. They give no discussion of the Units on how 
nany residues could be varied at once nor do 
mention the problem of unequal abundance of 
encoding different anmo acids. They 
proteins that either had wild-type d iner izu t ion or that 
did not dimerize. They did not seek prctcins having 
novel binding properties and did not find any. 

Several researchers have desi^jned an.i '■.ynthes ized 
proteins dc novo . These designed protein:; are sn:3ll 
and nose have been synthesized Iq viurc as ?o Lypept i.-ios 
rather than genetically- Gutte and colleagues n.-.va 
' made a polypeptide that bi.-.ds nOT in z>:S othanol 
{MOSE33). Recently Moser ct aj^ (:<05E:37) reported 
genetic expression in C2.U both of the designed 24 
residue DDT-binding protein and of fusions of the DDT- 
binding sequence to LacZ. They state that design of 
biologically active proteins is currently i.T.p.iss ible. 

Erickson nt, al^ (EP.IC36) have designed and 
synthesized a scries of proteins that they have named 
bctabeliins, that are meant to have beta sheets. They 



10 

sucft^cse use ot polypc[)C ide 'jynct.es is vich nixed 
reagents to produce several hundred nnalcgous 
bctabelLins. They suggest the mixture be pasr.ed over a 
colu.Tin to recover the analogues vith high affinity for 
a chosen target compound bound to the colunn. They 
envision successive rounds of mixed synthesis of 
variant proteins and purification by specific binding. 
They do not discuss hew residues i:hcuLd be crtosen for 
variation. Lccauso proteins can net ce ar.piifiod, the 
researchers must sequence the recovered protein to 
ioarn which subst i tut iOr.s improve bindirtg. The 
researchers must limit the level of diversity so that 
each variety of protein- will be present in sufficient 
quantity for the isolated fraction to be serjucnced. 

A number of methods have been developed to 
separate cells through their affinity to various 
substances. Bonnafous e_t (flcr.'r;3 5) review cethods 

that have been applied to ani.-al colls, and cite two 
cc-::cn problems: a) non-specific interactions between 
cells and affinity supports, and b) irreversible 
binding, of' cells to affinity matrices. Possible 
reasons for irreversible bi.ndinq include r/jitiple 
points of attachr.ent and very high affinity between 
cells and antibodies ur.cd as affinity .T.a ter ials . 
Chromatographic separation of animal cells is still 
difficult because of their fragility. Eacterial cells, 
bacterial spores, and so.'no tactcr icphage , however, arc 
sturdier than animal cells and have been fractionated 
based on proteins displayed on Choir surfaces. 

Ferenci and collaborators have published a series 
of papers cn the chromatographic isolation of mutants 
of the maltose-transport protoin Lar.3 of col 1 

(WAr/n?9, FKr<E30a, FERF.SOb, FEREBOc, rEr>£32a. F£R£S2b,. 
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Fcr<Eo3, CLUJis-;, FE?.t:36a. rL'P.ra o , 5f:p.£:36c, ?7:n£S7rt, 
FERfc:07b, HEIN87, and H EXNC 8 ) . The . papers report tiiat 
spontaneous anrJ induced r.utants at the i^iiQ genetic 
locus can be isolated by chromatography over a column 
supporting immobilized .naltose, raUodextrins , or 
starch, i.e. carbohydrates that could be cctabalizcd by 
the bacteria. The .reports speculate that other 
applications iirc possible, but cpe-rifically mention 
only the elucidation of the rosid'jes responsible for 
the selectivity of the noitoduxcrin pore or si.-Llar 
pore proteins, 

Ferenci's experiments measured the conbination of 
the individual affinity of nutant Lar.B noiocules and 
the level of expression. Several clasr,es of mutants in 
lamB were isolated. Ono claf.s had higher affinities 
for both maltose and starch, one class had lover 
affinity for starch but high^jr aCrinicy for aaltoce, 
and another class had higher affinity tor starch but 
lower affinity for maltose. 

Mutar;ts --ere generated eirhor by hydrcxyl ar. ine 
treatr.ent "of a plasm id carrying ::hc encire gcr.c-, or by 
insertions of two extra codons at natural H^a ir sites. 
Levels of mutagenesis were picked to provide r-ingie 
point nutations or single insertions of tvo i:os.-.d':es . 
No multiple mutations verc ccught or fn-jnd. 

LamB is a large trimeric integral r.enbrane 
protein; such proteins are very difficult to 
crystallize or oven to solubilizc. Therefore it is 
difficult to use single-crystal protein X-ray 
crystallography or NMR to obtain detailed :d structural 
information. Garavito ct a_L:. (CA.^.A3 3) have obtained 
crystals of I.ar.n that diffract X-rays, but the 30 
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ctructuro of ^.lie procein luis not tocn clctcr;-. ined . 

T^.cre .-ire .-nn.-lols (OEHP.3 7, llEir/sn) :>..jc incl'^dc the 
cccondary rue cure of Lur.B» i.e. c.^oy specify w^lich 
residues? are in bcta-chceC. con f ornaC icn , unich residues 
5 are in turnu, and wnich residues are cn the outside* in 
the poriplacm, or in the r.enbrane. These r.odels do not 
specify how the beta sh..'c»:s are arranged nor -Jhich 
r.urns jre close to which other turns. 

10 Fcrcnci ond Lee (K£."£8<)a)' reported cn the 

temperature sensitivity of carbohydrate binding in 0 . 
r. *: e 0 r o t h c r no nh i l u s . At hi9her t---pcri tures , the 
organism breaks down the pol/saccha r iO^ , the binding of 
which was the object of the study. Clunc, Lcc, and 

15 Fercnci (CUJUtK) reported t.hat presence of complete 0- 
antiqen affected the binding properties of La.r.B on the 
surface of f;^ col i . Both of these reports point up the 
difficulties of working with live bacteria that can 
r.C'tabclize cheriicals and change tr.oir physiological 

20 behavior during the chrc-natographic exp-e r inent . Heine 
5LC a I . (HF:i:."C3) have used the chri-otaxis of ^ i 
recently to isolate tr.utantj in l ^r.H thot arc unaffected 
in c.he.-notax is ; thi? appro-^ch is lira tod to notabolitc-s 
that affect cner.otax'is . 

25 

Make la rrt a I . (lUKTSO) rcvirvcd r.ntt-.-Jds that 
involve chemicnlly coupling antiqcns to boCt-?r iophaqo 
to produce a sr.nsitivc, quantitative .-ietectio:. cysten 
for antibodies. The r.othods revie-od expicit the 

30 ability to a.T.plity the signal prcd'jced by antibodies 
binding to the a.ntigcns coupled to t."e phage, through 
growth of the phage. The antigens -ore joined -o the 
phace chcnically and not encoded i.-. the gone:; of the 
phage. Th:is there was no sorting of genetic -aterial. 

3 5 Furthernoro, the objective.*; oL" tho rethods reviewed 



involve titering the phage that fail to bind, as an 
assay of antitody. The methods of the present 
invention, in nost cases, involve qrouth and 
acplification of genetic i^ackages that bind .virh high 
5 afrinity. % 



In 1935 Smith (5MIT85) reported inserting a 
heterologous gene into gene HI of bacteriophage fl. 
The gene III protein is a .-ainor coat protein necessary 

10 for infectivity. ' In sor.e cases the inserted gene 
preserved the original reading franic. leading to 
expression of heterologcur> protein as an inserted 
domain in the gene III protein. Saith demcnstcated 
that' the resulting strain of fl viiicn are adsorbed by 

15' antibody against the protein encoded by the 
heterologous DNA. The antibody --.as bound to a 
polystyrene petri dish. The phage were cluted at pH 
2.2 ar.d retained sc-ne infectivity. However, tne single 
copy of fl gene Hi was used for insertion cf the 

20 heterologous gene so that all copies of gene III 
protein were affected; infectivity of the resultant 
phage was recucsd 25-fold. Snith also deruonstratcd 
* that batch elution fro.-?, a r.Late can separate fl virions, 
that differ by only a few protein dot:.ains on their 

25 surfaces. 



Smith presented his rr.ethod as a way to isolate 
cloned genes using antibodies to the gene products. He 
made no mention of mutagcnizing the inserted genetic 
30 material or of inducing novel binding properties in the 
inserted protein domain. 
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De la Cruz Qt aH (CRUZ88) have expressed a 
frag.-nent of the repeat region of the circumsporozoite 
protein from P lasnoriiun f.^ Icinaru.Ti on the surface of 



M13 as an insert in the qcne III protein. They shoved 
Chat the reccriinant phaqe were both antigenic and 
ir.nunogenic in rabbits, and that such recoriinant phaqe 
couLd be csei for B epitope mapping. Tr^.s researchers 
suggest that si-iiar rcoorr.binant phage could be used 
for T epitope r.apping and for voccine dcve Lcprent . 
They do not suggest mutagenesis of the ir.serted 
r.ateriai . 

Gene fraqr.ents coding for portions of hepatitis B 
virus antigens have been fused to fr^iqments of iariB. 
If the point of fusion is in a region coding fcr 
exposed dcnains of LamB, the HDV antigens appaar cn the 
'cell surface and are inr.unogenic iCliS^.RSl) , Charbit ^ 
ai. fCllAK37) suggest use of these engineered strains 
for ccveiopr.er.t of a live bacterial v.-.ccine; they have 
not reported interest in -.uta^^jcnes i s of :^.e fj-jed 
heterologous qerc fragr.O:it=. nor in cevelcp-ent of 
binding capabilities. 

Recently Tjian a.-.d colleagues (K0DA3S, b:^:Cc7, and 
J0::E37) have shcvn that DNA of defir.ite sequence bound 
to* an affinity colc-n can be used to purify prccems 
that bind the CNA sequence-spec i f ica I ly . The prct-^i-ns 
are purified as nuch as 10C0-:old in two 
chrc^atographic itcps or S3-fold in a single s'.^p. 

Patents And patent applications which -ay be cj 
interest incl'.:.-.e US Patent I.'o. ^.70;.692; "Ccr.outcr 
Based L'ystcn .-i.nd Method fcr Dcternining and Displaying 
Possible Che.-^icai Structures for Converting Ccjcle- or 
Multiple-Chain polypeptides to Single-Chain 
Polypu-ptidoG" (Ladner '692). issued to Pobert Charles 
Ladncr on 3 ::over-ber 1937 and assigned to Cenex 
Corporation. r.aJner '692 describes a design r.ethod for 
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15 

converting proteins composed cf fjo or norts chains into 
proteins of ' Ccwer polypeptide chains, but with 
essentially the sane 3D structure There is no 
mention of variegated DMA and no qonetic selection. 

Kobcrt Charles Ladi\er also has six p.iter.t 
appl ic^it ions pending before the USPTO and assiq-=d to 
Cenex Corporation: 

07/92, 110 
07/21,046 
07/21,047 
07/34,964 
07/34 ,965 
07/34 ,966 

Sonia K. Cute man is n^.ned us a joint inventor on 
US patent No. 4.745,055 ( "3t rc-ptcr.yces Secretion 
Vector") and on Sor. No. 21,465. 



20 None of the Ladncr cr G-Jternan patents or 

applications is believed to disclose or suq-:;e3t the 
present invention, but it is rec'jcstcd that e^ch be 
- considered by the Examiner. 

25 Ho admission is made that any cited reference is 

prior art or pertinent prior art, and the dates qivon 
are those appearing cn the reference and cay not be 
identical to the actual publication date. 

3 0 SUMM..\RY OK THK INVENT TON 

This invention relates to the construction, 
exprcrsion, and selection of nutated genes that specify 
novel proteins with desirable binding properties, as 
35 well as these proteins thensolvcs. The substances 
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bound b/ these proceins, hercinaTcer referred tc as 
"corgets", .-ay be. buc need not be. proteins. Targets 
may include other bioLogical or synthetic 
nacronolccJles well organic and inorganic 

CO i ecu I OS. 

I. ' 

Thp novel binding proteins r,ay be obtaLr.co: 1) by ^.j 

i'J 



In one e.-.l:"-:i L;-ent , tlic invention ire lutes to: 



m 



r;utatinq a gene encoding a known binding protein within 
Che subsequence encoding a known binding dcr.ain. or 2) 

10 by taking such a subsequence of the gene for a t'irsc 
protein and ccriining it vith all or part of a gene for 
a second prctein (vhich r.ay or may not be itself a 
>:no'-n binding prcteinj. J) by nutating a ycne encoding 
a protein whicJi. while not posses- ing a r.nown binding 

15 activity, possesses a secondary or higher structure j,;^ 
that lends itscl: to binding activity (clefts, grooves, ^. 
etc . 1 . or s) by r.utating a gene encoding a known j,r. 
binding protein our. not in the subsequence kncwn to 
cause the .bin:iira. Tr.e protein frcr. vhicn tha novel 

20 bi.-iding protein is derived - need not have any specific 
affinity for the target natcrial. 



a) preparing varieqated population of rcplicable 
genetic p>-ci;a'^es, each package inciudi.-.g a nucl'^ic 
acid ct.-:--truct coding on expression for an outer- ^ 
surf ace-i LSplaycd potential binding protein 
cor-.p rising (i) a structural signal directing the 
display cc the protein on the outer surface of thii 
package and (ii) a potential binding dona in for 
binding said target, where a plurality of 
different potential binding do-ains are displayed jTr 
by the individual packages; 



18 

The invert icr further reliitos to a zet^-:d of 
preparing a r.ixed peculation of repli.cable genotlc 
packages in which each package includes h gone 
expressing a potential binding protein in such a r.onner 
that the 'protein is presented on the outer surface of 
the package. This r.et^cd conpr:r.cc: 

i) preparing a variegated prpulaticn o! CN'A 
inserts of each of which cor.prises a first 
sequence which codes on expression for a potential 
binding donain and, o second sequence encoding 
signal directing that the encoded protein be 
displayed on the outer surface of a chcsen 
replicable genetic package, unt 

ii) incorporating the resulting popular ion ct' DNA 
constructs into the chosen repiiciible genetic 
packages to prcJuce a pcp.jlation of replicable 
genetic pccl^ages. 

In a preferred e-bodir.ont, the potenti^i-oiniing- 
protein-encoding in:;ert3 are incorporated into a gene 
encoding an cuter-surface protein of t:;o replicjblo 
genetic package. 

The invention er.ucr.passos t^.e design and synthosi-i 
of variegated O.'.'A ^needing a far.ily o: pctsnti-ii 
binding proteins characterized by constant at'd variable 
regions, said proteins being designed with a view 
toward obtaining a prctein that binds a pr'rietemincd 
target. 

For the purposes of this invent icn, the tern 
"potential binding prctein" refers to a protein encoded 
by one species of D:-A r.olccule in a population of 
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The invent icr. Ciivthcr relates to a rct^-:d of 
preparing a nixed pcpulacion of repli.cable genetic 
packages in ■ '-hich each package includes a gone 
expressing a pocencial binding protein in such a r:anner 
that the 'protein is oresantod on the outer surface of 
tho pacJ:agc. This r.ethcd conpr:r.cs: 

i) preparing d variogated population o! CNA 
inserts of each of which cor.prises a first 
sequence which codes on expression tor a potential 
binding domain and, a second sequ'snce encoding 
signal directing that the encoded protein be 
displayed on the outer surface of a chr;«;en 
replicable genetic packagri, ^I'.i. 

ii) incorporating the resulting popui^ricn c: DNA 
constructs into the chosen rcplic-ible gonetic 
packages to prcJuce a population of rspliceble 
genetic pr.ckages. 

In a preferred ezbod ir.cn t , the potent 13 ini ing- 
prctein-encodir.g inserts are incorporated into a cene 
encoding an cuter-surface protein of t;:c r^plicjbU- 
genetic pac/tage. 

The invention enjcrpasses trie design and synthesis 
of variegated DNA '^needing a far.ily of potent 
binding proteins cha rac tcr izeo by constant a.'d variable 
regions, said proteins being designed vith a vLcv 
tovard obtaining a prctein that binis a pr-^ieteminod 
target. 

For the purposes of this invention, the term 
"potential binding prcteih" refers to a protein encoded 
by one species of DNA r.olccule in a population of 
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proteins th.it in fact bind to the target ("succcs.sf ul 
binding dornainc") . Atter one or more rounds of such 
enrichment, one or more of the chosen genes are 
examined and ^equcncrid. If desired, new -loci cf 
variation are chosen. The i-.clected daughter genes of 
one generation then beccmc the parent sefiuonces for che 
next genera c ion of v:: r iccjj tf.-J DNA, beginning the next 
"variegation cycle." Such cycles are contimied until a 
protein with the desired targot affinity is obtained. 

The appended claims are hereby incorporated by 
reference into this spec i fico t ion as an enumeration of 
the preferred embodiments. 

BRIEF UR.'JCHIPTIOr; OF TilE Di:AW:iiC5 

Figure 1 is a r-cheiratic .shoving the relationships 
between various types of Binding Domains (BD). 



r- 



20 Figure 2 is a flo-' ch'art showing the major steps urcd 
CO create a novel protein with affinity for a pre- 
determined target. 

Figure 3 is a stereo view of a molecular model of thr? 
25 -coat of the bacteriophage fl. 

Figure 4 is a schtsnatic ot PCO contacting a molecule 
of target ratcrial. 

30 Figure 5 is a stereo view of a hypothetical interaction 
between BPTI and myoglobin. 

Figure 6 is a schematic of the binding surface cf a PUD 
at various stages in the process of selecting a 
35 successful binding dor.ain Cor a hypothetical target. 
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proteins th/it in fact bind to the targp-t ( "succcs.s f u I 
binding donainc"). Atter one or more rounds of such 
enrichnenc, one or nore of the chosen genes arc 
examined and liequcncid. If desired, new loci cf 
5 variation are chosen. The :;clected daughter genes of 
one generation thon boccnc the parent setiuonccs for the 
next genera c ion of vciricqjtO'J DMA, beginning the next 
"variegation cycle." Such cycles are continued until a 
protein with tne desired target affinity is obtained. 

10 

The appended clains are hereby incorporated by 
reference into this specification as an enu.neration of 
the preferred e.-nbod inents . 

10 BRIEF DFrSCiUPTIOtf OF THE ni.'AWMlCS 

Figure 1 is a r-chcratic showing the relationships 
between various types of Binding Domains (BD) . 

20 Figuri 2 is a flow ch'irt showing the najor steps ured 
CO create -a novel protein with affinity for a pre- 
determined target. 

Figure 3 is a stereo view of a molecular r.odel of the 
25 -coat of the bactcricphage fl. 

Figure 4 is a scherr.atic ot i\ PCD contacting a nolocule 
of target ratcri.al. 

30 Figure 5 is a stereo view of a hypothetical interaction 
between BPTI and myoglobin. 

Figure 6 is a schc:r;atic cf the binding surface cf a PliD 
at various stages in the process of selecting a 
35 successful binding dor.ain for a hypothetical target. 
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EacTorial Ccil:^ as Genetic Packaqo'^ 
Prefcrrod fMCwCrial Cells for U:;c as CPs 
Preferred Outer Surface Proteins for 
Di.splayinq i:-DOs on Elaccerial Colls 
Choice of Insertion site for . IP3D in 
Dacceri'ii Cell OSF 

In vivo :;oleccicn tor P:;eudo 03 P Gene t ror. 
Rar:'J-j.-a DliA Inserts in Oacterial Cellf. 

Displaying IPSO on bacterial spores 

Preferred Sccteria 1 . Spores for Use as CPs 

Preferred Outer-Uurface Proteins for 

Displaying IPBD on BncteriaL Spores 

Cncice of Insertion site for IPDO in 0:5? 

In Vivo i;e lection for Pseudo OSP Gene i*ro= 

Random O/iA Ir.serts in Baccerial Spores 

OiSpLaying IPBD on O'Jter Surface of Phages 

Pretftrrod Fh^qes for as CPs 

rrc:errc3 CHPs for Di:;playing IPQOs on V\,2r\^z, 

Crioice of :r.:;ertion site for IPnO ir. OS? 

.La Vivo f->? :onticn for Pseudo-Oj;? Gene fror. 

Pa noon rJ.'.'A Ir.7^crts in Pnacjes 

Choico of ir-DO 

Infljcnco of t.irqet size on choice of I. =50 
Intluor.cc: of tarqec charge on choice of IPlif? 
Other considerations in the choice of E?bD 

Choice of CCV 

Hcsigning the c:<C" i nh^ qene Insert 
Conetic re';'jiation of the oso-i nhH gene 
r;::A seq-jcnce design 
.'Specific n:;A seque.^ce assignnent 
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1.1 Bac-crial Ccllj as Ccnctic PacK»icjO'j 

1.1.1 Preferred f:^icr.crial Cells for U:;c as CPs 

1.1.2 Preferred Outer Surface Proteins for 
Di.splAyinq I.-GOs on claccerial Colls 

1.1.3 Choice of Insertion site for . IP30 in 
aacteri^il Cell OSF 

1.1.4 In vivo lloleccicn tor P:^eudo 05 P Ceno £ ror. 
Ror:-J-j.-3 fji;/. Inserts in Sdcteriol CellF 
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Displaying r?5D on bacterial spores 

Preferred Bccce ria 1 . 3pc ros for Use ac CPs 

Preferred Ou t e r- Su r f ace Proteins for 

Dicplayin'j IPBD on Ei.icterial Spores 

Choice of Insertion site for IPDO in 03? 

In vivo Lie lection for Pseudo OSP Cene £ro= 

Random OIIA Inserts in Bacterial Spores 
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OiiipLdv in-7 Ir-SD on 0»Jter Surface ot Pha<;es 

Preiftrrci Ph^.qes for ii:;e ^s CPs 

Preferred Cf-^'s fcr Di::p laying IPCOs on Vl.jrjec 

Choice of Insertion site for I PHD in OS? 

.In V'-'-'g .s^r ion for ?seudo-Oj;? Gene ivzz 

Randon Ulin Ir.serts in Phages 

Choice of 

Influence of t.irqct size on choice of I?2D 
Influence: of target charge on choice of ZVV.P. 
Other conr. idorat ions in the choice of IPbO 

Choice of CCV 

flcsignin*.) t.^c 05=0- 1 nM qene Insert 
Cone tic r<.";u Idt ion ot the osn-i pM geno 
r;:.'A sequence design 
.specific C.'.'A scque.icc assignment 
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13.1.2 The SccoMdocy :;ct: . 

13.1.3 Cncice of Kcriiducs to Vary Initially | 

13.2 Choosinq ranqe of v^rirttion ; 

13.3 Design of vq DL'A Encoding Vi^O Family I 
l^^l Insertion of synthetic vgDNA into plasmids 

14.2 Transformation of o^lls • 

14.3 • Ccovth of the Cf*(vrjpQO) population 



15.7 ' C.ha racteri::inq population 
2 0 15.8 Testing of oind inq. affinity 

15.9 Other A\t*tlnity Separation .'-'.eans 

16.0 The: !:ext Varicqatior Cycle 

25 17.0 OTHER CON'31 DER.\T:or:S 

, 17.1 Joint selections 

17.2 Selection for non-binding 

17.3 Selection of PaOc for retention of structure 

17.4 Created binding proteins not uniq-j.-^ 
30 17.5 Other nodes of nutaqcnesis possible 

Example 1 Derivation of ::ovcl Binding Protein lor 
Myoglobin Using niTI as IPBD, M13 as CP. -m-J 
the Gone vii: protein as OSP. 
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]5. Isolation of Gr(2UD)s with bindi 

10 phenotyper. 

15.1 Attaching the target material to a column ^£''M: 

15.2 Reducing selection due to non-specific ^R-^ 
binding 

15.3 Eluting the column 
15 15. Recovery of packages 

15.5 Amplifying the enriched pjck.^ges 

15.6 Determining whether further enricnment is 



13.1.3 Cncice of Ho:;LdL:CG Co Vary Initioll/ 

13.2 Choosinq ranqcof variation 

13.3 Dosign of vq DJiA Encoding I'.no Family 

5 14.1 Insertion of synthetic vgONA into plasmids 

14.2 TranGformat ion of cells • 

14.3 ■ Growth of tho CP{vr|pQD) population 

15. Isolation of cr(StiD)s with b inii inq-to-ca r'jo t 

10 phenotyper, 

15.1 Attaching tho targ*?t material to a column 

15.2 Reducing selection due to non-spcoific 
binding 

15.3 Eluting the column 
15 15. t Recovery of packages 

15. 5 Amplifying the Rnrichod packages 

15.6 Determining wncthcr further enricnment ir. 
needed 

15.7 Character!;: inq population 
20 15.8 Testing of Dinding affinity 

15.9 Other Affinity Separation Means 

16.0 The. t:ext Variegation Cycle 

2 5 17.0' OTMER COUSI CER.\TIOi:S 

. 17.1 Joint selections 

17.2 Selection for non-bLndinq 

17.3 Selection of PDDs for retention of structure 

17.4 Created binding proteins -.lot uniq-j--^ 
30 17.5 Other modes of mutagenesis possible 

Example I Derivation of l-'ovcl Binding Protein lor 
Myoglobin Using BITI as IPBD, M13 as CP, .m-J 
the Gene Vlll F^rotoin as OSP. 



bind a chosea target, it is referred to herein as a 
"binding domain" (BD). A preliminary operation is to 
engineer the appearance of a stable pro-.oin domain, 
denoted as an "initial potential hindinq dojiain" 
5 (IPDO), on the surface of a genocic package. T!'.c 
present invention is concerned vLch zno cxpror.sion of 
numerous, diverse, variant "potential binding donains" 
(PBD), ail related to a "parental poccncial binding 
domain" (PPBO) such as the binding doinjin of a knovn 

10 binding protein, and with selection and ar.pl i ficatior. 
of the genes encoding tMe r.oct successful rutant TODs. 
An IPDD is chosen as PPSD to the tirsc round of 
variegation. Sc lect ion-through-b: nd ing isolates one or 
more "successful binding dODains" • (SDD) . An SBO frozi 

15 cne round' of variegation and select ion-c^.rouqh-bind ing 
is chosen to bo the PPDD for the next round. The 
invention is not, however. United to orc-cirs wich ft 
single UO since the method may be applied to any cc all 
of the DOS of the protein, sequentially cr 

20 simultaneously. The relationships of tr-.c vatious 
are illustrated in Figure 1. 

Conventionally, DMA sequences ar-? '-ritt-r^n from z' 
to' 3', left-to-riqht showing cnly tlie scque.K'e t.^.-it 
25 will appear as nRNA (with each T of DMA chanqod t'j U in 
mRNA) . 

protein: H - L - F - 

30 anti-sensG DNA: 3' ATC CTT TTC ... r 

sense DNA: 3' TAC CAJ. iVAG ... 5' 

ihR.'iA: 5' AUG CC'J UUC ... 3' 

3 5 

The coir.plcr.ontary strand is the one used as; tcr.piate 
for p.P-I:a synthesis and so is called the "sense strand"; 
we will use this convention throughout. Although this 
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bind a chosen target, it is referred to herein as a 
"binding domain" (BO). A preliminary operation is to 
engineer the appearance of a stable protein domain, 
denoted as an "initial potential binding do-nain" 
5 (IPDD), on the surface of a genetic packane. The 
present invention is concerned vith the exprocsion ot 
numerous, diverse, variant "pctencial binding uonains" 
(PBO), all related to a "parental potential binding 
domain" (PPBD) such as the bindinq do;nain of a known 

10 binding protein, and with selection and j-?1 i f ica tion 
of the genes encoding the r.ost successful -ut.int PDCs. 
An IPOD is chosen as TPSD to the tirst round of 
variegation. Sc lection- through-b; nd i ng isoUtcs one or 
more "successful bind ing dona ins" ( SGO; . An 3D0 fron 

15 one round of variegation and select ion-t.^.rouqh-bind ir.g 
is chosen to be the Pl'DD for the next round. C'r^c 
invention is lot, however. United to prcteir.s with ft 
single UO since the wcthod may be applied to nny cr ail 
of the BDs of the protein, :;ecu en t i a 1 I y cr 

20 simultaneously. The relationships oC the vAnous ODs 
are illustrated in Figure I. 

Conventionally, o:iA sequences arri --rittc^n frcm I' 
to 3', left-to-right showing only tlie ^icque.u'e th-vt 
25 will appear as nRNA (with each T of DNA chan^;od to U in 
rr,RNA) . 

protc in: M - ~ f " 

30 anti-sensG DMA: 5' ATC CTT TTC ... 3' 

sense D:JA: 3' TAC C^-J-. /\AG ... 5' 

nRNA : 5 ' AUG CV'J UUC ... 3 ' 

35 

The corr.plcr.cntary strand is the one used as template 
for mRr.-A synthesis and so is called the "ccn:;c strand"; 
we will use this convention throughout. Although this 
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r.he aniilyte can be freed frc.-: the affinity naterial 
once the ic purities arc- jo shed a*- ay. 

Affinity colur.n chi-o.-'i toqraphy involves chemically 
5 Attaching tno aifmity .-.ate rial to an inert solid 
support ratrix that i-J held in a ccluan so that 
solutions can be passci over the r.atrix in a controlled 
way. Mixtures that night ccr.tain the analytc are 
passed over the natrix to -hich any anaiyte conpcr.ent 
10 in the nixturc adheres. Separation is achieved by 
passing a gradient of sere type ovor the matrix and 
collecting fractions. It is- also possible to recover 
purified naterial fr.r the rratrix by other ceans after 
izipurities have been vjchci away. 

15 

An alternative to colu-n affinity chronatoqraphy 
is batch eluticn trc:= ar. affinity r-.atrix catcrial held 
in soae container. .^.rfinity material is cheraical-y 
bound to the -atrix. A nixfjre that night contain the 
2 0 analyte is aided ar.j tho --:itrix is rinsed vith buffer. 
The rracerial is rir.sed with a series of buffers 
containing i.icreasir.r concentrations of solutes chcsen 
- to wash i.-?'jrities avay. The analyte is recovered in 
purified fcr3 either in cne cf the buffer fractions or 

2 5 bound to the natrix. 

Another alternative to cclu-n affinity chronatc^j- 
raphy is iratch oU:tion fror. a plate. Tho affinity 
naterial can be cherically bound to a flat surface, 
ZO such as the cottc.- cf a polystyrene petri dish. A 
=»ixture th.-».t night contain the analyte is added to the 
plate and the plate is rinsed with a buffer. 
Subsequently, the plate is washed with a series of 
buffers ccncaining increasing ccncent rat ions of solutes 

3 5 chosen to separate co.-pcnents hAvinq lower affinity for 
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the an^^lyte can be freed frcz Che affinity aaterial 
once the icpurities arc -fashed avay. 

Affinity colur.n chro.-'i toqraphy involves chemically 
5 Attaching tne aiZ\r.L'y -Jterial to -an inert solid 
support ratrix that iii hold in a ccluon so that 
solutions can be passed over the r.atrix in a controlled 
way. Mixtures that night contain Che analytc are 
passed over the natrixto -hich any analyte conpcnent 
10 in the nixture adheres. Separation is achieved by 
passing a gradient o:' sc.r.e type over the raatrix and 
colloctinq fractions. It is also possible to recover 
purified material frtr. the r:atrix by other neans after 
impurities have been -ishod avay. 

15 

An alternative to colu.-.n affinity chromatography 
is batch elution fr-:= an affinity r.atrix catcrial held 
in sone container. Affinity r.ateriai is chemically 
hound to the r.atrix. A nixfjre that night contain tne 

2 0 analyte is aided ar.j the -•:itrix is rinr^jd vith bufior. 
The rraterial is rinsed with a series of buffers 
containing increas i.rr concentrations of solutes chcsen 
- to wash ir.p'jrities avay. The analyte is recovered in 
purified fcm either i.*-* one cf the buffer fractions or 

2 5 bound to the tiatrix. 

Another alternative to cclu-n affinity chronacc;- 
raphy is catch eli:tion fro- a plate. The nffinity 
material can he cherically hound to a flat surface. 

10 such as the cottcs of a polystyrene petri dish. A 
nixture th.-.t r:ight ccntain the analyte is aadcd to the 
plate and the plate is rinsed with a buffer. 
Subsequently, the plate is washed with a series of 
buffers containing increasing ccncentrat ions of solutes 

35 chosen to separate corpcnents h.iving lower affinity for 
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or cells. It has been used to separate bac ter iopha'ges 
on the basis of charge. (SERwa?) . 

The present invention naJces use of affinity 
separation of bacterial cells, or bacterial viruses (or 
other genetic packages) to enrich a population fcr 
those cells or viruses cnrrying genes thai code foc 
proteins with desirable binding properties. 

In the present invention, the words "grow", 
"growth", "culture", and "amplification" r.ean increase 
in number, not increase in size of individual cells or 
phage. In the present invention, the words "select" 
and "selection" are used in the genetic sense; i.e. a 
biolcgical process whereby a phenotypic characteristic 
is used to enrich a population for those organisms 
displaying the desired phenotype. Choices or elections 
to be made by hunians are indicated by "choose", "pic>;". 
"take", etc., but not "select". 

The process of the present inven^iior. comprises 
three major parts: 



25 



I. design and production of a roplicable 
genetic pack3ge (GP) that displays an IVSD on 
the surface of the CP; the cotnbinaticn is 
denoted CP( IPEO) , 



II. design and iir-plomcntation of an affinity 
30 separation process that separates C?(IP5D)s 

that bind to a known affinity molecule from 
wild-type CPs or CP(IPDD~)s, neither of which 
binds the known affinity nolecule. and 



O 



15 



30 

or cells. It has been used to separate bacteriophages 
on the basis of charge. (SERwa?) . 

The present invention naJces use of affinity 
separation of bacterial cells, or bacterial viruses (or 
other genetic packages) to enrich a population for 
those cells or viruses carrying genes that code for 
proteins with desirable binding properties. 

In the present invention, the words "grow", 
"growth", "culture", and "amplification" r.ean increase 
in number, not increase in size of individual cells or 
phage. In the present invention, the vords "select" 
and "selection" are used in the genetic ser.se; jo. a 
biological process whereby a phenotypic characteristic 
is used to enrich a population for those organises 
displaying the desired phenotype. Choices or elections 
to he made by hunans are indicated by "choose", "picy.". 
"take", etc., but not "select". 
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The process of 
three major parts: 



the present invent ior. comprises 
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I. design and production of a rcplicable 
genetic pack^ye (CP) that displays an on 
the surface of the CP; the combinacicn is 
denoted CP(It'EO) , 

II. design and ir.plcmcn ta tion of an affinity 
separation process that separates C?(IPBD)s 
that bind to a known affinity molecule from 
wild-type CPs or CP(TPDD")s. neither of -which 
binds the known affinity nolecule. and 
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3) designing an ar.ino aci-i r.ocucr.co that: a) 
includes the IPBO as a subsequence and b) uiU 
cause the IPDO to appear cn the CP surface {Sees, 

5 1.1.2, 1.2.2, 1.3.2, dnd 4) , 

4) eng-.r.eerir.q a gcr.'.-, donctc-J or-Ilr^?^. th.ic: a) 
codes for t^-.e dc-si.c;ned a.ii.-o ac;d -j'jq'iencc, b) 
provides the necessary genetic rcgj L a t ;on , and cj 

10 introduces convenient sites for genetic 

manipulation (Sees. 4.1, 4.2, 4.3, «: . 1 , and 5.2), 

5) cloning the o3D-irbd ger.e inc= tr.e CI- (Sec. 
6.1), and 

15 

6) har-.'esting the transferred CPs (5ec. 7} and 
testing thc-n fcr presence of on the C? 
surface (Sec. 3); this test is porf^rr.ed -rith in 
affinity r.olecule having high affinity fcr IPSO, 

2 0 denoted AfK( IPBD) . 

In another preferred e -.bod C rent , P-2rt : c: the process 
involves : 

2 5 1) choosing a CP such u-s ^ bactorirU cell {Sfec. 

1.1.1), bacterial cp^^-e fl.2..l), cr phage (I.J.I) 
having a suitable cuter surface protein (Sees. 
1.1.2, 1.2.2 and 1. 3.2j , 

30 2) choosing a stable I^SD (See. 2), 

3) designing a 0::a sequence that: a) encodes the 
I.f^DO as a subsequence and b) 'jcrt-^ins suitable 
restriction sites so that randcn OSA n-iy be 
operably linked to the : n'cd gene frag-ent: and c) 
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3) design inq an acino acid r.cc-jcr.cr chat: a) 
includes the IPED as a subsequence ond b) will 
cause the tPOO to af.pe^c cn the CP surface (Sees. 
1.1.2, 1.2.2, 1.:.2. cjnd 4) , 

4/ eng-.neerinq a gcr.'.-, d'inctcd 0S:lirll.='2. that: a) 
codes for t^-.e dcsif^ncd a.ii.-o ac:d -j'^q-jonce , b) 
provides the necessary qe.-.etic rcqjlation. and c) 
introduces convenient sites for qenctic 
Danipulation (Sees. 4.1, =.1, and :-i,2), 

5) cloning the osp-i r^d gene intc :he CV (Sec. 
6.1), and 

6) har-'escing the trans ferried CPs (Sec. 7) and 
testing then fcr presence of IPBO on the CP 
surface (Sec. 3) ; this test is pcrfc— ed -/ith in 
affir.itv rolecule having hiqh affinity fcr 
denoted Afr-'.( IPBO) - 



In another preferred er.tod C r.er.t . P-2rt 
involves: 



process 



1} choosing a G? such o-3 a tacteri'*. L cell (.Sec. 
1.1.1), bacterial cr-cre (1.2.1), or phage (l.J.l) 
having a suitable cuter surface protein (Sees. 
1.1.2, 1.2.2 and 1.3.2j , 



2) choosing a stable Ih^D (Sec. 



1) designing a 0::a sequence th-it: a) encodes the 
IPDO as a subsequence and b) cents ins suitable 
restriction sites so that randcn 0^:A tniy be 
operably linkeJ to the inlcd gone frag-ent; and c) 
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domain. References co PbO cr rrri in Part 
indicate a preparatory intent. 



r are co 



In Part II wc cotinize separation of G?(IPBO) frc-n 
wild-type CP, denoted wtCP, baced on the affinity of 
TPno for AfM{irDD). To est-::tUsh the sensitivity of 
the affinity separacion process, we separate sntall 
anounts or' CP(IPBD) :ro.-n r.uc.i larqer anounts oC wtC?. 
In a preferred ernbodi.'a^nt , P^rt 11 of the process of 
the present invention involves: 

1) preparing affinity coLcr.ns bearing AfM(rPBD) at 
various densities of A f M ( : ?3D) / ( volune of .-natrix) , 
(Sec. 10.1) , 



2) preparing CPCIPBOjs with varic 
IPDD per CP, 



Its of 



3} picking a graoient regir.e for elcting z'r.a 
colurr.ns (Sec. 10.1), 

4) determining vhich ccr.'c inat ion of: a J IPBD/CP. 
b) density of Af .X ( li'DDj / C''^olii'"^G of support), c; 
initial ionic strength, d) elutinn rate, and e) 
(aiTicunt of CP)/(vclur.e of support) loaded, gives 
the best r.eparation cf C?(I?nO) frcr: wtGP (Sec. 
10 . 1) , 

5) determining Che cr.allest amount of CP(I?bO) 
that can be isolated fron. a r.uch larger amount of 
WtGP using the optir.al ccr.Jition. (Sec. 10.2), and 



6) determining the efficiency of 
separation procedure (Sec. 10.3). 
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domain. References co ?tn cr red in Part I are co 
indicate a prcparotory intcnc. 



5 In Part II we cotinizc separation of C?(IPBO) frcr: 

wild-type CP, denoted wtCP, based on the affinity of 
IPllD for AfM(IPDD). To establish the sensitivity of 
the affinity ooiparation process, ue separate s;rj I i 
anouncs of GP(tPBO) from tr.ucn larrjer anouncs of wtCP. 
LO In a preferred cnbodirient, P^rt 11 of the process of 
the present invention involves: 

1) preparing affinity coLur.ns bearing AfM(IPaD) at 
various densities of A f M ( I ?3D) / (volume of matrix), 

15 (Sec. 10. 1) , 

2) preparing CP(IPBDJi; uith various amounts of 
IPBD per CP, 

2 0 3) picking a graoient refine for e luting zr^.Q 

colur.ns (Sec. 10.1) , 

4) determining which ccr.i: ir.a tion of: a) IPBO/CP. 
b) density of Af X ( I r'2D) / (volu.-nc of support), c; 
25- initi.^l ionic strength, d) elution rate, and e) 

(aiT.cunt of CP)/(vclur.c of support) loaded, givns 
the best r>eparation cf C?(IPnD) frcn utCP (Sec. 
10 . 1) , 

30 5) determining the sr.allest amount of CP(I?bO) 

that can be isolated froa a cuch larger amount of 
wtCP using the optir.al condition, (Sec. 10.2), and 

6) determining the efficiancy of the affinity 
35 separation procedure (Sec. 10.3). 
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1- pic^^irtg a sec of ::owf:c^l ror. id-jcr- in the PPHD 
CO vary: the principal innicacors of vhich 
residues to vary inclu'^c: a) chc 3D scruclurc of 
Che IPBD, b) sequences of honoloqcus protein:;, and 
c) conpucer or theorccical .nodding -.hat indicaces 
•which rer.iducs can colorato different anino acids 
ui-r.out diLirupcinq the i^nderiyinq scructurc {'Sec. 

4j picking a subsec of tne res iducs ' p icked in Part 
III. 3, to bo varied s ir.ul canccusly (Sec. 13.1); 
tho principal considerations are the nu.-nber of 
different variancs and vhich variants are wichin 
the detection capjbilitios of the attinicy 
separation dotcrnincd in Part LI. and setting the 
ra.-.ge of variation (Soc. 13.2); 

5) ir.pl cr.cn ting the variegation hy: 

a) synthesizing the part of the o?n-X'M O'er.e 
that encodes the residu-^s to be varied using a 
specific nixturo of r.uclco t i .if* substrates for 
some or all of the bases cnccdirtg residues 
slated for variaticn. thereby creating a 
population of n::A -olecuies. denoted vgDNA 
(Sec. 13.3). 

b) ligating this vj'j.VA, by standard mrithods, 
in*co the operative cloning vector (OCV) (e.g. 
a plasnid or bacteriophage) (.'3ec. 14.1). 

c) using the ligatcd ON'A to transform cells, 
thereby producing a population of transforr.ed 
colls (See. 14.2), 




. ' ■ . . • . ■ ■ ■ : V -.. .. 





o 



10 



15 



20 



i; picking a sec of :;ovoral rcr.ic;'jcr> in the PPHD 
CO vary; the princip-^i inrticators of vhich 
residues to vary iii'jlu'--c: a) the 3D structure of 
Che IPBD, b) sequences of horaolorjcus protein:;, and 
c) computer or theoreticai nodding ihat indicates 
•Jhich rer,idues can tolerate different amino acids 
wi'.r.out diiirupting the underiyinq structure (3oc. 
i:. I) . 

4) picking a subset oC tr:e residues picked in Fart 
III. 3, to be varied c ir.ul tanccusiy (Sec. 13.1); 
the principal considerations are the nurihcr of 
different variants and vhich variants are within 
the detection capabilities of. the attinity 
separation dctornincd In Part 11. and setting the 
rar-ve of variation (Soc. 13.2); 

0) L.T.p 1 er.cn c ing the variegation by: 

a) synches icing the part of the o^flrXvi^l 0'^"^ 
chat cnco'Jos the rosi'Juos to Le varied using a 
specific nixturc of r.uclcotirlft substrates for 
some or all of the bases encoding residues 
slated for variation, thercbv creating a 

. population of D::a r.oleccies, denoted vgONA 
(Gee. 13.3), 

b) ligating this v^'j.'.'A, by standard m'^thods, 
in*,:o the operative cloning vector (OCV) ( c ,ti._ 
A plasnid or bacteriophage) f.lec. 14.1), 

c) uciing the lig.itcd DN'A to transform cells, 
thereby producing a population of transforr.ed 
cells (Cec. 14.2). 
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Abbroviat ion 



GP 



wtGP 



IPBD 



P3D 



SBD 



PPBD 



OSP 



OSP-PBD, 



OSTS 
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Mean inci 

Cenecic Package, e.g. a 
backer Lophage 

Wild-type GP 

Any protein 

T^e gene for protein X 

Initial Potential Binding 
Oorr.ain, e.g. BPTI 

Potential Binding Domain, o.c 
a derivative of BPTI 

Successf'jl Binding Domain, 
e.g. a derivative of BPTI 
selected for binding to a 
target 

Parental Potential Binding 
Dor.ain, i.e. an IPDD or an S30 
frcra a previous selection 

Outer Surface Protein, e.g. 
coat protein of a phage or 
Lar.D from col i 

Fusion of an OSP and a FBD, 
order of fusion not specified 

Cuter Surface Transport Sigp.al 
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CP(x) 

c?(:<) 

CP f osp-ubd ) 
GP(OSP-PBO) 

Crf pbd ) 
CP (PSD) 



ACM{W) 



a:m(w) ♦ 
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A genetic package containing 
the x gene 

A genetic package that 
dispU*\ys X on its outer 
surface 

CP containing an osp-obd gene 

A genetic package that 
displays PBD on its outside as 
a fusion to OSP 

C? containing a pbd gene, os_q 
irr.pl icit 

A genetic package displaying 
P3D on its outside, OS? 
unspeci f ied 

An affinity r.atrix supporting 
"Q'\ e.g. [TA lysozy-ci is T-; 
lysozy-e attached to in 
affinity catrix 

A coicculo having affinity for 
"W", e .g. trypsin is an 
AfM(BPTI) 

AfM(W) carrying a label, e,.ji... 
125i . 

A Chemical that can induce 
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ex-prcssion o: a gene, e.g. 
IPTC for cr.e l*^c'.JV5 prcnoter 

Opcr.^tlvc Cloning Vecccr 

K-r - [ T j ; 3 2 C ; / ; T : 5UD I { T is a 
cargcc) 

Kjj = [Ml f£nDl/tM:SB01 (M is a 
non-cargct; 

Donsi-y ot A:M(W) on aTfinity 
- a c r i X 

Mos^-Favorei arzino aci^ 

Lc.^.sc-F.wcrc-i arnir.o acid 

Abu ne.^ nee or r.olecules 
cncodincj ariir.o -Mzid x 

Out^r -crr.br^ne protein 

nuclccc i^c 

A bir.olcc'Jlar iiusoc i ion 
constant, y^i = i A J r 5 ],' ; A: 3 • 
S ign.Tl -sequence Peptidase T 



Yield ot* ssC::a up to Q bases 
lons7 

M.-ixinun Ivnqth of I'.sDNA that 
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can be synChesi;:ed in 
acceptable yieid 

Yield of plasmia DN'A per 
volur.e of culture 

D:.'A ligaticr. efficiency 

Kaxinutn nunber of 

transf ormants produced from 

^DIOO DNA of insec-c 

Efficiency of cnruna toqraphic 
cnrichnenc, enric^nenr. per 
pass 

Sensitivity of cnror-a tcqr?.pMc 
separation, can find 1 '.n 

:<ixir.uiTi nurJcer or enric^^nent 
cycles per v^rieqacicr. cycle 

Error level in synches 12 ir.cj 



Sec. 0.?: g:t:and.=*rd SPfTuonc irn not hod: 

The present invention is not liniccd to a single 
method of deternininq the sequence of nucleotides (nts) 
in DNA subsequences. In the preferred ecbod in-.cn t , 
plastnids are isolated and der.aturcd in the presence of 
a sequencing prir.er, about 20 nts long, that anneals to 
a region adjacent, on the 5' side, to the region of 
interest. This plai;tnid is then used as the tecplate in 
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the four seqiiencing reactions with one dideoxy 
substrate in each. Gequcncing reactions, agarose gcL 
electrophoresis, and polyacry lamide gel electrophoresis 
(PAGE) are performed by standard procedures (AUSUS7) . 

The present invention is not limited to a single 
method of determining protein sequences, and reference 
in the appended clai.-ns to determining the amino acid 
sequence of a domain is intended to include any 
practical method or combination of methods, vhether 
direct or indirect. The preferred method, in most 
cases, is to detcmine the sequence of the OSA that 
encodes the protein and then to infer the amino acid 
sequence. In sone cases, standard methods of protein- 
sequence determination may be needed to detect post- 
trans lationa 1 processing. 



20 The. major steps in the process of making and 

isolating a novel binding protein with affinity for a 
chosen target material arc illustrated in Figure 2. 

Sec. 1: Specification of Genetic Package and Moan? tor 

2 5 Display i no h He te rol fj aouq Rinding Do r-^. in On fcs Ov.t f ^ 

Surface : 

5 ec. l.Q: General Rc au i ro-rnt r. for Gen et ij^JMciiaoos 



It is c-mphasized that the CP on which selection- 
through-bindlng will bo practiced r.ust be capable, 
alter the selection, either of growth in some suitable 
environment or of j_n v : r r o amplification and recovery 
of the encapsulated genetic message. During at least 
port of the growth, the incrcose in nv:mbcr must bo 
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approximately exponential ---itn respect to time. The 
coriponenc of a population that exhibits the desired 
binding properties nay be quite small, for example, one 
in 10^ or less. Once this component of the population 
is separated from the ncn-bindinq components, it nust 
be possible to ar.plify it. Culturing viable cells is 
the most poverful n.r.p i i f ica - ion of genetic material 
Jtnovn and is preferred. Genetic messages can also bo 
amplified in vitro , but this is not preferred. 

A CP nay typically be a vegetative bacterial cell, 
a bacterial spore or a bacterial Ot:\ virus. A strain 
of any living cell or virus is potentially useiui if 
the strain can be: 

1) maintained in culture, 

2) affinity separated and retain its- viability, 

3) genetically altered '-ith reasonable facility, 
and 

4) manipulated to display the potential binding 
protein domain where it can interact with the 
target material during affinity separation. 



We believe that it is possible to cause a genetic 
package to display the IPBO or PBD on its outer surface 
without adversely affecting the viability of the CP or 
30 the binding characteristics of the IP3D or PBO. 

It is generally believed that the part of the 
polypeptide chain composing one domain folds almost 
independently of the parts composing other domai.is. 
35 There are natural proteins composed of two or more 



I: 




donains for uhich ' there is strong evidence that 
essentially the sane domain occurs nore than once, for 
example ovoaucoids and ovoinhibi tors (SCOT37) and 
kallikroin (CHUN'36) . Furthermore, essentially the sane 
docain can occur in several different proteins (SUDH85, 
CILBSS, and SCOT87) . 



Rossnan (R0SS31) and ethers have pointed cut that 
the 3D structure of individual dOT-ains can be preserved 

10 during protein evolution, even after the anino acid 
sequences have diverged so much that no significant 
hcr.ology can be detected. Hollecker and Creighton 
(KOLL3 3) studied the folding pathways of tvo black 
nanba venom proteins ( called I and K ) that arc 

15 honologous to B?TI. Although the seq-jences of I and K 
are clearly related to BPTI by tne identity of 19 and 
23 residues respectively, including aU six cysteine 
residues, there are 33 and 34 differences. I-"ot only 
are the 3D structures of the proteins very sir.ilar, but 

20 the pathway cf folding has also been conserved. 
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When gene fragrr-ents coding for c-o cc.T.ains fro.T. 
different proteins have been joined by genetic 
engineering and expressed, the dcr.ains frcn the 
original proteins sonetines fold irdependcntiy vhile 
tethered to each other (TOT:t36, SMIT35, MA::oS6) . If 
the insertion is the gene for the entire protein, that 
protein r.ay be converted into a dona in of the larger 
protein. Fusions of genes that detemine the domains, 
however, must be done at or near donain junctions, or 
do.-aain function r.ay be ir.paired TOTK36) . * In 

some cases, the inserted domain will fold, but the 
recipient protein will not; Bcckwith's fusions of mal F 
and phoA genes (BECKB3, M/\r:036) gave rise to functional 
PhoA domains attached to a frag-cnt cf MalF that 



30 



anchored the chLr:cric protein in the lipid bilayer. 
The MalF procoin ujs inccr.pLetc and could not function. 

There are two basic methods of arranging that the 
5 ifibd qene is oppressed in such a r.anncr Chat the IPED 
is displayed on the outer surface of Che CP. 

First, CNA encoding the IPBO sequence nay be 
oper.^bly linV:cd to CNA encoding all or part of an outer 

10 surface protein (OS?) native to the CP. If one or more 
fusions of fragments of x genes to fragments of a 
natural osfi gene are knovn to cause X proteir. dcmsLns 
to appear on the CP surface, x:hen we pick the QUA 
sequence in which an icbl gene fraqnenc replaces the x 

15 gene fragment in one of the successful q^firiS fu:5ions as 
a preferred gene to oe tested for the d ispUy-of-I PBD 
phcnotvpe. (The gene r.ay be constructed in any 
nanncr*) If no fusion data are available', then we fuse 
an inbd Iragr.cnt lo various fraqtr.ents, such as 

20 fr.-.g-enr.s that end at known or predicted domain 
boundaries, of the nrie gene and obtain CPs that display 
the or,P-'.rbd fusion on the CP cuter surface by 
screening or selection for the d isp lay-o f - 1 PiJO 
phenotype. The fu-jion of i?bd and c3o frag.-Ticnts may 

25 also include rragnents of random or pseudorandora DNA to 
produce a population. r.e.--bers of which nay display IPBD 
on the CP surface. The r.cnbcrs displaying IPBD arc 
isolated by screening or selection for the disp:.iy-of- 
binding phenotype. 
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While r.ost bacterial proteins remain in the 
cytoplasm, others arc transported to the pcriplasmic 
space (which lies between the plas.Ta membrane and the 
cell wall of grar.-ncgativc bacteria), or arc cor.vcycd 
and anchored to the outer surface of the cell. Still 



others arc ex-portoa (accreted) into the tr.ediun 
surrounding the ccLl. Thcue characteristics of a 
protein that are recorjnizcd by a cell and that cause it 
to be transported out of the c/toplasn and displayed on 
the cell surface will be corned "outer-sur tace 
transport signals" . 

It is believed that the conditlcr.s for an outer 
surface transport signal ..'ire r.oc particuloriy 
stringent, i.e., a rando»-n polypeptide of appropriate 
length (preferably 30-100 amino acids) h.^s a reasonable 
chance of providing such a signal . Thus, by 
constructing a chimeric gene co.T,prising a segment 
encoding the IF8D linKed to a se-^r.cnt cf random or 
pseudorandom DNA (the potential 05TG) , jnd placi.ng this 
gene under cc-trol of a suitable prc.-.ctcr, there is . a 
possibility that the chimeric protein no encoded will 
function as an OSP-IPHD. 

This possibility is greatly enhanced by 
constructing numerous such genes. o.tch having a 
different potential OGTS , cloning tl:-.vn into a suitable 
host, and selecting tor trans fcrnanti; Scaring the imc 
(or other marJcer) on their outer surface. 

The repiicablc genetic entity {phage. or plasmid) 
' that carries the orn-o hd genes (derived from the oijy- 
ipbd gene) through the se 1 t i on - thr-ough-bind ing 
process, see Sec. 11, is referred to hereinafter as the 
operative cloning vector (OCV) . When the CCV is a 
phage, it nay also serve as the genetic p.icKage. Vhe 
choice of a CP is dependent in part on the availability 
of a suitable OCV and suitable r-SP. 
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Preferably, the CP is readily scored, for example, 
by freezing- If the CP is a cell, it should have p 
short doubling time, such as 20-40 minutes. If the CP 
is a virus, it should be prolific, e.g., a burst si7.c 
of at least 100/infectcd cell. CPs vhich ore finicky 
or cxpcnsiivc to culture arc disfavored. The C? should 
be easy to harvest, preferably by cent ri fugnt ion . The 
CP is preferably stable for a tcr.pcrature r^nqe of -70 
to 42°C (stable at. 4*^0 for several days or veeki;) : 
resistant to shear forces found in MPLC; insensitive co 
UV; tolerant of desiccation; and resistant to a pH of 
2.0 tc 10-0, surface active agents such as SDS or 
Triton, chaotropes such as AH urea or 2M quanidiniurv 
HCl, common ions such as K^, Ua*, and 5O4 — , coar.on 
organic solvents such as ether and acetone, and 
degradativc enzyncs. Finally, there roust be a s;;itc-^blo 
OCV (see .Ccc. 3) . 

Although knowledge of :3pecific OSPs nay not he 
required for vegetative bacterial cells and ' endosporej , 
Che user of the prfisenc invention, preferably, will 
kno---: Is the sequence of any or.a known? ^ (preferably 
yes, at least one required for phage), 



necessary 
route preferred per e j . 
post-translat ionally procoscod? (no 



.-now does the 

OSP arrive at the surface of CP? (knowlC'J'je of routo 
different rouces have different no 

Is the O^? 
processing most 
preferred, predictable processing preferred over 
unpredictable process i rvq ) ; What rulc= are known 

governing this pcocecsinq, if there is any procosiiing? 
(no processing most preferred, predictable processing 
acceptable). What function does the OSP s^.-rv?: in the 
outer surface? (preferably not essential). Is the ZD 
structure of an OSP known? (highly proCcrrcd). Arc 
fusions between fragr.cnts of qi^d ^ rr.iqr.ont of x 
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Known? Does express icn of thcje fusions lc;id X 
appearing on the surface of the CP? (fusion data is 
preferred as Knowledge of a 3D scruccuro) . Is a "2D" 
structure of an 05P available? (in c^iis context, a "2C'* 
5 structure indicatco which residues are exposed on the 
cell surface). (2D structure less preferred than 3D 
structure)*." Whcrd are the dona in boundaries in the 
OSP? (not as preferred as a 2D structure, but 
acceptable) . Could IPBO qc throufjn the sane process a:; 

10 aCP and fold correctly? (IPBO niqric need prosvihetic 
groups) (preferably IPHD '^iii fold after same 

process) . Is the sequence of an oso promoter known? 
(preferably yes). Is opp gene controlled by 

rcqulatable promoter available? (preferably yes) . What 

15 activates this promoter? {preferably a diffusible 
chenical, such as IPTG) . How r.jn/ different OSFs do we 
know? (the nore the better) . Mow many copies of each 
OSP are present on each package? (-ore is better). 

2 0 The user will want V-.nov l c':;7e of the physical 1 

attributes of the CP: How larqe is the CP? (knovledt/e 
useful in deciding how to isoLoro CPs) (preferaoly easy 
to separate frcn soluble protein? such as IgGs) . Wh^t 
is the charge on the CP? (neucr.-il preferred). What is 

25 the sedimentation rote of the CP? (>;nowledqc preferred, 
no particular value preferred) , 

The preferred CP, OCV and Z'J? a:o those for which 
the fewest serious obstacles can be seen, rather than 
30 the one that scores highest on any one criterion. 

Next, we consider general answers to the questions 
posed in this step for the cases of: a) vegetativcly 
growing bacterial cells (Sec. 1.1), b) bacterial spores 
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(Sec. 1.2). . and c) (Sec. 1.3). Preferred OSPs for" 



several CPs are given in Table 2. 



Sec. 1.1. T Preferred l^nrtJir 1 Colls .is C? 



f. 



one may choose any we 1 1 -cha racte rized bacterial 
strain vhich may be grcvn in culture. The important ^...J 
questions in this case are: a) do we knew enough about 
mechanises that localize proteins on the o-Jtsidc of the 

V,*. -W-; ' ■ TDon fold in the environment of the f; :^ 

outer membrane, and c) will cells change expression of K.^ 
osD-Dbd . derived from psp-lpbd, durinc affinity |^:;., 
separation? Some IPBDs may need large or insoluble ^' 
prosthetic groups, such as hacm or an Fe4S4 cluster, ^..^ 
that are available within the cell, but not in the 
medium. The formation of Fe4S4 clusters found in some 
ferrcdoxins is catalyzed by enzyr.es found in the cell 
(BOKC35). IPBC3 that require such prosthetic qro-jps ^ 3 

may foil to fold or function if displ.v/f>ri on bacterial I i 

20 cells. C 



r 



The species chosen should have a ^ 



well- I 

25 characterized genetic system and strains defective in 5^ 

genetic recombination should be .ava i lab le . The chosen Y 

strain may need to 'jc nanipulatcd to prcv.^nC changes of [ 

it3 physioLoqicrl :;tate that would aUnr the number or 7 
type of proteins or other molecules on the cell surface 

20 during tho affinity separation procedure. In view of | 
the extensive knowledge of coU* ^ strain o: 
coli. defective in recombination, is the t;trongci;t 

candidate as a bacterial CP. Other preferred \ 

candidates ace S;i ' no.nr_U.a t^^EhimilxiliZ « n>iciJ_LiLS t 

f; . - 

3 5 subt 11 is . and r5.Cy.»'2EI?ni:-*: ■Toriin i nosa . j. 

p ' 

f- 
r. 

I. 

r. - 
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Induction of synthesis of engineered genes in if 
vegetative bacterial cells has been exercised through jf-';; 
thft use of regulated pror.oters such as lacUVS . tvoP . cr I''- -'4 

tac (MANI32) . The factors that regulate the quantity 
of protein synthesized include: a) pro.Tioter strength 
f cf . H0OP87) , b) rate of initiation of translation' (rf . 



C-GC, 1.1.2: ProCerro-:i Q -. iter Surface Prctcins f: 
nlGPlavinq IP9DS on Pacrorlal Cells: 



I -i 

GOLDS?), c) codon usage, d) secondary structure of 
nRUA, including attenuators ( cf . LANDS?) and 
terminators f cf . YACt3?) , e) interaction of proteins 
with n:R.MA (c^^ MCPH35, MILLSTb, Wi:iTa7), f) degradation 
rates of mRNA f cf . SUBBaS) , g) proteolysis f cf . 
GOTTS?) . These factors are sufficiently veil [^yv-i 
understood that a vide variety of heterologous protcir.s i\ :^ 

can now be produced in fL. col i or gi. s ubt i 1 is in at 
least roderate quantities CSKER38, EETTOS) . 



G r a n - neg a t i V e bacteria have out cr-.r.cnor ^ne 

proteins (Ot<P) , thit forn a subset of 0S?3. f^.^nv G.'^.Ps ^ ".^ 

span the membrane one cr r.ore times . The sign.sl-j that t 

cause OMPs to loc.-iliirc in the outer ne-btanc dre f* :] 

e'ncoced in the a.-ino acid sequence of the .Ti.^turc ^ Li 

protein. Fusions of fragments of oro genes vich 

frag.T.ents of an x gene have led 'to X appearing on the 

outer ir.enbrane (DE:::S3-;, CLEM81). The rules that govern 

the localization of OMP-X fusion proteins are not yec 

fully elucidated. Many c:'.Fs are polyr.eric and non- \- 

essential; a non-essential 0."P is preferred. A non- \ 

I- 

essfintial OMP for which there is )cnowiccge ot vhich 
residues are on the cell surface is nore preferred. A I 
non-OGsont ial CMP for vhich there is data showing that ^ . 

X is displayed an part of an OMP-X fusion is .-ost t-. • ' 



51 

preCerred. If no tusLon data are available, then we 
fuse an jpbd fragment to various fragments of the sso 
qeno and obtain CPs that display the osp-j.pbd fusion on 
the cell outer surf.^ce by screening or selection for 
the di.splay-of-IPBO phenotype. 

Oliver has reviewed nechanisr.3 of protein 
secretion in bacteria (OLIV35 and OLIV37) . Nikaido and 
Vaara (tiIKA87) have reviewed mechanis.-=s by which 
proteins become localiz-d to the outer membrane of 
Gran-negative bacteri'a. For 6:<G-,plc, the L-.m5 prot^jtn 
of col i is synthesized with a typical signal- 

sequence which is subsequently rnmoved . Denscn et ej^ 
(3£».se4) showed that LamD-LacZ fusion prcteins would be 
dcpor-ited in the outer aembrane of co \ j when 

residues 1-49 of the mature LanB protein ar2 included 
in the fusion, but that residues 1-43 are insufficient. 
The rules that govern localization of proteins in the 
outer r.enbrane of Gran-negative bacteria r^rain vague. 
Kiiiser et al^. (KAIS37) shoved that the export signal in 
Saccharor.vcgs cernvi s iae is very broad, cec-u->e when 
they tused random hu-an DNA sequences to C:;a coding for 
mature invertace, about cr.e fifth cf the sequences 
resulted in the appearance of invertase free in the 
medium. 

The outer riGr.branc protein LarnD of co \ i is a 
porin for maltose and rr.a 1 todcxtr in tr:ir.s?ort, and 
serves as the receptor for adsorption of bacteriophages 
la-nbda and KIO. This protein har» been purified to 
homogeneity (E!JDE78) and shown to function as a trir.er 
{PALV70). Mutations to phage resistance have been used 
to. define the parts of the LanB prot?.Ln that adsorb 
each phage (ROA>toO, CLKHSl, CLEM83, CPHRS?) . Phage- 
resistance nutations arc dominant (t-lARCS:). suggesting 
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that there is no p^'cf eront ial assembly cf wild-type or 
mutant subunits. 

In 1 aniB '^ cells, addition 'of naltose or 
5 raaltodcxtrin inhibits a r"orn of motility called cell 
swarming, and lar.E r.utants del^cctive in this process 
have been characterized (HE.NSS) . These mutations have 
been sequenced and coir.pared to Che wild-type sequence 
(CLEM81) and the concomitant protein domains have been 
10 analyzed (CLEM83). Topological models have been 
developed that describe the function of phage receptor 
and m£.ltodextrin transport. The models describe these 
domains and their locations with respect to the 
surfaces of the outer membrane (CHAR34, (iEIS33) . 

15 

LamS is transported to Che outer membrane if a 
functional M-terminal ■ s^»qucnce is present; further, the 
firr.t 4 9 amino acids of tae mature sequence are 
reauired for successful transport (BENS34) , Homology 

20 between parts of LamB protein and other cuter membrane 
proteins OmpC, OmpK and FhcE has been detected 
(ia.:A8 4 ) ; includ ing homology between LamE amino acids 
39-49 and sequences of Che ether proteins, The^e 
subsequences nay label the proteins for transport tj 

25 the outer netnbrar.e. Further, monoclonal antibodies 
derived from mice immunised with purified Lamt3, have 
boon used to characterise four distinct topological and 
functional regions, two of which are concerned with 
maltose transport (CADAU2) . 

:o 

Cenoral Knowledge on processing of signal 
sequences in E^ col i is rc'evant to the present 
invention both for use of col i ger se and for use in 
conjunction with filamentous phage (vide LoLo) • 
35 Genetic experiments on processing of signal seq'-'^nces 
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5;gc. l.l.n Choic e y n^nrri on site foe tPDP , 



indicate that if the S21-F22-A23 sequence is prcson/ed, 
signal peptidase (SP-I) will cleave after A23 (OLIva?). 
Many examples havo been cited in which the 0U^ coding 
for the leader or signal sequence fron one protein has 
5 been attached to the ONA sequence coding for another 
protein, protein X (DECK83, INOU86 ChlO, LEEC86, 

KARKae. and bOQL'37). Expression of such a chimeric f. 

gene often causes protein X to appear free in the k 
periplasm. That is, the leader causes the new protein 

10 to be secreted through the lipid bil^yer: once in the | 

periplasn, it is cleaved off by SP-I. t... 

I 

Eeckwith (BECK33 and t'u\r;086) has shown that when I 

the oiioA gene is inserted in frane into the coding I 

15 sequence for an integral rnenbrane protein, for example p 

Mair; that the PhoA domain is localir.od according to [, 

■ where in the integral membrane protein the BhoA qeno C 

u^s inserted. That is, if fihoA is inserted after an ^. 

anino acid which normally is found in the cytoplasm, ^ 

20 then PhoA appears in the cytoplasm. If &hQA is |. 

inscrt-d after an amino acid normal l"/ found in the I 

periplasm; however, then the PhoA dcmciin is localized ^ 

pn the periplasnic side of the n*=nbrane, and anchored ^. 

ir •■ 

in it. 



Ecckwith and colleagues (UECK03) have extended [ 

these obser^-ations to the 1^ gene that can he ^ 

inserted into gen-G for integral membrane proteins such ^ 

that the LacZ dor.ain appears in either the cytoplasm or p. 

the periplasm according to where the lj?cl gene -as j: 

inserted. t 
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OS?-irCiO fusion proteins nGGd not fill a 
structural role in the outer r.embranes of Cram-negative 
bacteria because parts of the outer nicrbranes are not 
highly ordered. for large OSPs there is likely to be 
one or raore sites at vhLch opp can be truncated and 
fused to ioh d such that colls expressing the fusion 
will display IPEDs cn tlie cell iJurCucc. It fusions 
bct-een fragr-ents of cm and x have jsen shovn to 
display X on the cell surface, ve can design an osp- 
iobd gene by substituting i pbd for x in the DMA 
seq'jence. Otherwise, successful OMP-IPBD fusion is 
preferably sought by fusing fragments or the best cmp 
to an i phd , expressing the fused gene, and testing the 
resultant CPs for d ii^olay-of- IPBD ph-;notype. Ws use 
the available data about o:iP to pic>: the point or 
points of fusion tetvecn orio and : r"d to M.^xinize the 
lilcclihcod that IrSO will be displayed. Alternatively, 
ve truncate cn^p at several sites cr in a manner that 
produces oso crQrpr.?.nt3 of variable Icrgih and use the 
cr.o fragments to icbd ; cells expressing the fusion are 
screened or selected which display :?3Ds on the cell 
surface.. An additional alternative is to inclucie short 
segments of randcr. D.'.'A in the fusion of cr;o fragr.ents 
^ to i Pbd and then screen or sc 1 oct Che result inn 
variegated population for r.eT.bors exhibiting the 
d ispiay-of-IPDD pi'.erotypc . 



The pronoter for the cso- ir.bd gene, preferably, is 
subject to regulation by a scall chemical inducer, such 
as iscpropyl th icga lactos ide ;IPTC) (lac pronoter) . iz 
need not corrc fro.Ti a natural oso gone; any rcgu la table 
bacterial pronoter can be used. 



Once a genetic packaging systen employing 
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conprisinr. « c! a periplastic transpct. 

,,^,n. of randoa D..A (as .^,,,,,1 .er..n..or 

3, » stop codon. and , o.A 

previously ''^^'^^-^^^^.^.o.i.^ Kncn OS.s^ 
.o encode an 0ST3. l^>-e ,,,,, xn 

fusion of i^^^ ^/t^;,,,,, ^ sii.n.ly pcefcred^ 
either ord.r, ^ut ^ ^ ,„«d in t.l= 

Xsolatec fro. tne po.uU.t.on ^^^.^ 

,e screened to. ^^^'^^^ ;_,,„,in. is -ad tc sal-t 
version of -^^'^'^'^'''''■''''^^ll on tne .CP s-^-^ 

ItTeiy":-; - - ^"^^ 

rrCaV-o.-.- p.enotype. 

„f Che ran-Joa DMA 
f„r i---d upstrean of the - 
The preference ^^p.„er in which --e 

3..ses fro. consideration o J. ^^^^ 
,u=ces.fui CPdf-SO, -1 be ^^^.^^ ,..e 

,„,.oduce numerous include gratuitous 

gene. sor,= of '"^^^^ „„don DMA. then 

t:5;^=-ns. K ^4 ^-f^",.., CO no 0SP-P30 
gratuitous stop, coders -Y^^^^,,. ^ CoUo-s 

protein appearing =n -^'^ ,,op codors in ^ 

randon OMA. then gratu. ^^^^^^^ 

:rcersu"artr::';e prote.s Often are non- 
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specifically sticky so that C?s cispl.^ying inccnplete 
POOs are easily ro.-novcd from che populdtion. 

The random DiiA can be gcntiratcd from any QUA 
having high sequence diversity by partially digesting 
5 with an enzyrr.e that cuts very oftct. . S au:A I, for 
example, generates cohos ivc-cndod ONA thac can be 
cloned into a S.^rM I or n^i H site. A I tern-it i vcJ y , 
one could shear D*JA having hiqn sequence diversity, 
blunr the sheared ON'A with the large tragnent of £^ 
CO I i 0?-A polyncriiES I (hereinafter ""c-fc rrcd ** ~ ~ 
Klencv f ragnent) , and clone the sheared and blunted DNA 
into blunt sites of the vector {yj^tilBZ . p295, AUSL'S?: 
5.1.1) . 

1- 5 Sec. l.T: Di^nlayi nq TPrP on b..-j r ori-tl spores: 

Bacterial spores have desirable properties as CP 
candidates. Rac : 1 1 us spores neither actively 

metabolize nor alter the proteins on their surface. 
iO However, spores are much nore resistant ^han vegetative 
bacterial cells or phage to chemical and physical 
agents. Spores have the disadvantage that the 
molecular mechanisns tnat trig.jcr sporulation are less 
.well worked out than is the forr.ation of Mi: or the 
5 export of protein to the outer r.enbrane of ctI 1 . 

— 1-2.1.: Prefo!;i;-Pri Ap-r-t-rMri t^^^r-r.^ f-^^ gp^. 

Bacteria of the genus D n^-il lus fom endospores 
0 that are extraneiy rcsistanr. to dar.age by heat, 
radiation, desiccation, and toxic chcnicals (reviewed 
by Lcsick gt /lL^ (L0SIG6)J. These spores have cor.plex 
structure and norphogcnes is that is species-specific 
and only partially elucidated. The following 
5 observations are relevant to the use of Baci l lus spores 
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ac gcnecrc pocfia^jes for the purposes of the present 
i nvent ion . 

Plasmid Dr'A is conirionly included in spores. 
Plasmid encoded proteins have been observed on the 
surface of O^ c i 1 lus spores (DECR80) . Sporulaticn 
involves co.riplcx temporal regulation that is now 
nodcracely well understood {'uOSlBb) , r.pecial siq.-na 
factors, such as sigma^, are produced during 
sporulation. Ri.'A polymerase bound to a sporulation 
signa factor recognizes promoters that are noz 
recognized by RUA polymerase bound to a vegetative 
sigpa factor. The sequencrs of several sporulation 
promoters ore known; coding sequencer operative.ly 
linked to such promoters are expressed only during 
sporulation. Ray et al . (RAVC87) have shown that the 
C4 promoter of 8^. subt i 1 is is directly controlled by 
RilA polyr.erase bound to sigr.a^. 

Donovan er: a I . have identified several polypeptide 
cc.T.poncnts of subt 1 1 is spore coat (DON087) ; the 

sequences of tvo co.r.pletc coat proteifiS and a.T.inc- 
teminal fragments of two others have been determined. 
Sone components of the spore arc synthesized in the 
forespore, e.g. srall acid-soluble spcre proteins 
(ERRI38) , while other co.T.pononts are synthesized in the 
mother cell and appear in the spore ( e.g. the coat 
proteins). This spatial organization of syntheisis is 
controlled at the transcriptional level. 

Spores self-acser-ble, but the signals that cause 
various proteins to localize in different parts of the 
spore are not well understood; presumably, tho signals 
controlling deposition of the coat proteins frc.n the 
cytoplasm of the mother cell onto the spore coat are 



cmbcddec! in the po I /[jopt : scquoncc. Some, but not 
all, of the coat protcir.i; aco synches ized as precursors 
and are then proccsr.'id oy specific proteases before 
deposition in the spore coat (CONOa?)., Viable spores 
5 that differ only sliqr.cly fro.n wild -type arc produced 
in ^ SL-bt ill s even if any c.-e of four coat proteins is 
.•niscir.rj (OONOgv). Dicuiricie bonds- torn vithin the 
sr.orc f thiol reducino a-jer.t'j are r.oeded to r.olubiiize 
several of the proteins of the co^tt) . The 12kd ccat 

10 protein, CotO, ccntair.c 5 cysteines. CctD also 
contains an unusually hiqh nu-ber of histidines (IS) 
and prolines (7). The ll>;d coat protein, CotC, 
cofjtains only one cyot-">i.-»e and one methionine. CotC 
has a very unusual aaino-ncid sequence with 19 lysines 

15 (K) appearing as 9 r-;< ilp<-,ptides and one isolated K. 
Th<^re are also 2n tyrosines (Y) of which 10 appear as b 
Y-V dipeptidcs. Peptides rich in Y ari*i K ar'2 krovn to 
bccr.ce cross 1 in/:ed in oziiiiiiinq env irbnr.ents {DcV073, 
waits:, v:ArT3 5, WAItSi) . CotC cor.tair;3 16 D and E 

20 anino acids that nearly ecus Is the I'J Ks. There jr.*j r.o 
A, f , n, I, L, P, Q, S, or W arr.ino acids i:\ CctC. 

tloit.^er CotC. nor CotD is p;:jt-transldtlonaHy cleaved. 
The proteins CotA and Co*:3 are pcst-tran:; 1 ti t iona i ly 
cleaved. 

25 

• Endospores froa tho qonus 3ac i 1 ! n r. are rr.ore stable 
than are axes pores frcr. ? t roptonyces . Ha ci 1 lus 
si;h c i I 1 3 forms spores in -o 6 hours, but G tr? pto "vc£s 
species ."nay require days or woek-j to sporuiatc. In 

.■!0 addition, q^^nu-tic >:novicd-:e and r.anipulation is nuch 
r.crc developed for r'l' r.i i 1 i s than for cth'^r spore- 

torr.ing bacteria. Thus P,"".^ i I Ins spores are preferred 
over Snronronvcqs spores. Ocicteria of the qenus 
Clo ::tridiun also torn very durable endospcr'^s, but 

35 Clostridia, b^ing strict a.ii'* robes, are not convenient 



to culture. The choice oC a species of B.ic i l lus is 
governed by hnc'ledge and availotaiiity of clcr. imj 
systens and by how easily sporwlation can be 
controlled . A particular strain is chosen by the 
criteria listed in Sec. 1.0. Spores are exposed to an 
oxidative cnvironr.ent after release frora the r.other 
cell, so that c: i.-ul f ides , if any, wir,hin the ivnn niqhc 
form. Many vf;r;G tu t ive ti iocherr. ica 1 pjth-dys are chut 
down when ^.porulation begins so that prosthetic groups 
might not be available. 

Sec . 1.2.2 P referred ou te r-su r f -tee proteins for 

DisDlaving IPBD on Bacterial Scores: 

If a spore is chosen as CP, the promoter is the 
most . important part of the orp gone, because the 
promoter of a spore coat protein is most, active: a) 
when spore coat protein is being synthesized and 
deposited onto th.-s spore and b) in the specific place 
that spore coat proteins ar? bcin.:j r.ade. In 3... 
s ubt i 1 is , •s'oc.e of the spore coat proteins arc post- 
translat ional ly processed by spocitic prot-^iases. It is 
valuable to knov che sequences of precursors and nature 
coat proteins so that we can avoid inircrpora t i ng -the 
-recognition sequence cf the speci t ic protease into our 
construction of an OSP-lriO fusion. The sequence of a 
mature spore cc.tt protein contains information that 
causes the protein to be deposited in the spore coat; 
thus gene fusions that include some or all of a r.ature 
coat protein sequence are preferred for screening or 
selection for the display-cf-IPBO phenjtype. 

Fusions of i nbd fragments co cotC or cotD 
fragments are likely to cause IPDO to appear on the 
spore surface. Tfie genes cctC and cotO arc preferred 
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osp genes because CoCC and CotD are^ not poct- 
transla tionai ly cleaved. Subsequences froni cor. A or 
cotB could also be used to cause an I?3D to appear on 
the surface of Dj. iftubt ills spores, but must take the 
post-translot.icnal cleavage o:* these proteins into 
account . 0;/A encod ing I PUD ecu id oe f »jscd to a 
fragment ot* co: A or coCH at ci'her end of che ccaing 
region or at aitcs interior to tr.e ceding rogicn. 
Spores could then be screened cr selected for the 
display-of-Ipno phenotype. 

To date, no "^ac i 1 1 us sporuljtion pror.oter has been 
shown to be inducible by an exogenous chemical ind-jcer 
as the lac promoter of c s 1 r . Mevcrthel ess , the 

quantity of protein produced fron a sporulaticn 
promoter can be controlled by other factors, such as 
the Dtifli sequence around the Sh ine-Da Iga rno sequence cr 
cod on usage. Chemically inducible scorulaticn 
promoters can be devciocod if r.ece-s.T r-/ . 

Soc. 1.2.3: Choice of Innert ion gjto for IPBr^ ir. C-S? 
of Dacterijl Sncre: 

The considerations governing i.ri:sertion site in tne 
spore OSP are tr.e saxe as thr;se given in Section 1.1.3. 

Sec . 1.2,4: : n V iva S«%1 ect i .-: n f cr P s eudQ-o so ^-vr.qs 

r'ron Randon Ci.'tA Inr.erts in n.Tct^r i ^I Srores: 

Although the ccns idora t inns for r,pores are nf'e'rly 
identical to the considerations for vegetative 
bacterial colls (Sec. 1.1), the availatle inforn:aticn 
on the nechanisr.s that ca'ise proteins to appear cn 
spores is r.ea<^er so that use of the randon-OMA approach 
becomes a r.ore attractive option. 
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We can use the ap-^roac.^ described »-ibove at 1,1.4 
for attaching an IPDD to an cp_LL cell, except that: 
a) a sporulatior pror.otcr is used, and b) no 
per iplas.-n ic signal iicqiionco should be present. 

S ^c. L.:: nicpl; ivinn TPL-r: Cuter Sur-'/ar^ of PtCMr. cs_i 

Soc. 1.3.1: Preferred P^-^anz (or Ur.e CFr.: 



2) the genome of the pha^e -UGt be 3t?.all cnounh to 
allov convenient nan i pu la t icn . 

Aaditional cons i'Jc rat ions in ch ocr.ing phaqc are: 



Unlike bacterial coils and spores, choice of a 
phage depends strongly on y.ncvledge of the 3D structure ^'j. 
of ar. CSP and hcv it interacts --ith other proteins in 
the caps id. The size of the phage genotr.e and the 
pacViaging mechanism are also important because the 
phage genome itself is the cloning vector. The gs^ 
i nhri gene must be inserted into the phage genome; 
ti-ercrorc: V. 

1) the virion must be cjp::ble of accepting the i 
insertion or substitution o * genet i c ma ter ia 1 , and ^ 



1) the morphogenetic pathway of the phage 
determines the cnv i rr.r.r.ent in which the IPDD vili 

have opportunity to fold, ^ - 

1 

i 

2) I PUDS containing essential d i su I f idou may not f 
fold within a co.Xl, f 
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3) IPDDs needing l^rge or insoluble prosthetic 
groups nay not fold if secreted because the 
prosthetic group is Ijcxing, and 

A) when variegation is introduced in Part III, 
-ultiple infecticnj cculd generate hybrid CPs thjt 
cjrry the cene Ccr zr.o rr>D but have at least jc.t.c 
copies ot a different P3D on their surfaces; it is 
preferable to inir. i;e this possibility. 

Bacteriophages arc excellent candidates for CPs 
because there is li.tle or no enzymatic activity 
associated with intact -.iture phage, and because the 
genes are inactive ojtside a bacterial host, renderir.g 
the rrature phage particles r.etabcl ica 1 ly inert. The 
fila.-er.couG phucc y.li ar.d bacteriophage HhiXl'.M are of 
particular interest. 

ri I i>.~er: tous 'jYi^^.n : 



The entire life cycle of the filamentous phage 
M13, a coDir.on cloning an:l sequencing vei'tor, is well 
understood, lilZ and f: jrc so closely related t!*tat we 
consider the prcpertic-s of each relevai;t to both 
(Pu^iSCSS); any d i f t c rent i a t ion is for hLstorical 
accuracy. The qonctic strjcture (the conpldte sequen-TO 
(SCHA7o) , the i'jcntity ani function of the ten genes, 
and t.^.c order of t ra.-:;cr i p t ion and location of the 
prcr.otorr.) of* '113 is veil Vinovn as is the physical 
strjct'jre of the viric.-i (iiANr.'Sl, DOLK30, CMA.WTq, 
ITCK79, KAPL7b. jrjHr.oDC, KUilN'S?. .^O^KOSO, M.\RV7S, 
KESS7 3, C:iKA31, -^\SC£6. .-JS.-ol, SCI!A7a, S.MIT3 5. W£3S7 3, 
and r.:y.:<32); see RASC36 for a recent review of the 
i;tructure and :u-ction of t.^c coat proteins. 
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Filomentciis chace enter coll through, the sex 

piLus cells bearing the F-factor. Achtman et al . 
(ACHT7S) observed that the pilus is extraordinarily 
sensitive to SOS; 0.03% SCS inhibits binding of M22 to 
pilin in vitro . Infection nay therefore be inhibited 
by SCS. 

The 50 a-ino ocid nature coat protein is 
synthesized as a 73 anino acid precoat (ITOK79), The 
first 23 amino acids constitute a typical signal- 
seqfuence which causes the nascent polypeptide to be 
inserted into the inner cell ner.brane. 

An col i signal peptidase (SP-I) recognizes 

amino acids IS, 21, and 23, and, to a lesser extent, 
residue 22, and cuts betveen residues 23 and 24 of the 
precoat (Kl.'HN*85a, KUHN'SSb, OLIV37). {See also sec. 
l.i.2 for general kno'-rledge on secretion in col i . j 
After removal cf the signal sequence, the ar.iro 
terminus of- the nature coat is located on t.^e 
periplasnic side cf the inner Der.brar.'2; the carboxy 
terminus is on the cytoplas-ic side. About 3000 copies 
of the i-aature SO ar.ino acid coat protein asscciatc- 
s-ide-by-s ide in ti;e inner .r.e-brane. 

The gene Vi, vn, and rx proteins are also present 
at the ends of the virion in about five copies e.^ch. 
The s ir.gle-stron.-iod circular phage D:'A associates vith 
about five copies of the gene til protein and is then 
extruded thro-jgh the cotch of ne.-brare-assoc ir-.tcd ccat 
protein in such a voy th.it the DN'A is encased in a 
helical sheath of protein (WEaS7S) . 0!iA dees not 

base pair (that vould i.-pose severe restrictions on the 
virus genome); rather the bases intercalate with each 
other independent of sequence. Because the .Ml 3 geno«.c 
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because PhiX174 can accept alriost no addlcional DNA; 
the virus is so ciqMtly constrained that several of its 
genes overlap. Char.be rc et al . {CHA-Ma2) sho-'cd that 
mutants in gene C are rescued by the vi Id-type G gene 



t 
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is extrvided through rhe r:eabrane and coated by a large ^ 

number of identic.il protein molecules, it can be used [: ' 

as a cloning vector (WATS37 p273 , and y.ZSSll) . Thus ue | 

can insert extra genes into Ml 3 and they vill be f' / 

carried along in a stable r.anner. f- 

Marvin and coUacorators {MARV7 3, r<AK036, BANNED j, 
have determined jn £Dproxinatc :d virion structure of tM'.^ 
fl by a combinac.i.cn of genetics, biochemistry, and X- r— 
ray diffraction from fibers of the virus. Figure 3 is 
drawn alter the -.ode I of Banner et a 1 . CBANN31) and t - i'{ 

shows only the C^ipho^ of the protein. The apparent t;.:--;^ 
holes in the cylLndrical sheath are actually filled by 
protein side groups so that the DMA within is 
protected. The amino terziinus of each protein r.onor.cr I, 
is to the outside cf the cylinder, while the carboxy ^■ 
terminus is at smaller radius, near the D!!A. Although ^ 
othor filamentous phtiges f c . g . Pfl or Ikr-- have 
different- he! ica I cymmetry, all have coats composed o: 
many short ■ a lpl:a-ho I ica 1 monomers vi-h che amino 
terminus oC each monomer on the virion 3urf::ce. 

Dactor ioDhaco Phi>:i7.: : 

The bacteriophage PhiX174 is a very ::m3ll 
icosahedral virus which has been thoroughly scud led by 
genetics. bicchcmi-Jtry , and electron microscopy (Sec 
The Sinqlc-Strondfid Dr. 'A ?>.^~es (CE:::h7S)). To date, no 
proteins from PhiX174 have been studied by X-ray 



i 

diffraction. FniX174 is not used as a cloning vecror ^ 



11 



carried on a plasnid co that the host supplies this 
prote in . 



Three gene products of PhiX174 are present on the 

c * 

outside of the mature virion: F (capsid) , C (r.ajor t-. 
spike protein, 60 copies per virion) , and H "(minor 

spike protein, 12 copies per virion). The G protein j*...,- 

corr.priscs 175 anino acids, -hile H cor.-.prisos 3 28 amino j:- : 

acids. The F protein interacts with the single- Ll^ 

stranded DMA of the virus. The proteins F, G, and H t""** 

are translated frorr. a single mRN'A in the viral infected j-j; 

cells. |, 

Largo DNA Phages j. 

Phage such as lamtda or T4 have much larger ^' *' 

gencnes than do M13 or PhiXl"?^. Large genomes are less p'; 

conveniently manipulated than snail genomes. A phage ^-i 

witi. a large genome, however, oould bo used if genetic f_ 

r.-.anipulation is sufficiently convenient. Phage such as K 

lambda and TA have more complicated 3D capsid )]' 

structures than MIj or PhiX174, with more OSPs to J^.- 

choose from. Phage lambda virions and phage T-5 virions j." . 
- forrn intracel iularly , so that IPDDs requiring large or • 

insoluble prosthetic groups might fold on the surfaces { 

of these phage. T . 

PN'A Ph ages |.* ' 

y 

RNA phage, such as Qbeta, arc not preferred |' 

because manipulation of is much less convenient ^' 

than is the manipulation of UNA. Although competent. - 
RNA bacteriophage arc not preferred, useful genetically 
altered RllA-contain Lng particles could be derived fron 
RJIA phage, such r\s M32. 
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MS2 is a typical sr.all RJ«'A phage chat carries only 
three genes that arc tightly regulated through RNA 
structure and protein-R:JA interactions. The WtA fills 
5 the protein capsid so that no additional genes can be 
acccTjnodated. To use MS2 as a CP, ve would need to 
eliminate nost of the natural vir^l gonoce so that Jn 
OSD- inbd gene could fit into tnc protein c^psid. It is 
known th-it the A protein binds scqucnce-speci f ica 1 ly to 

10 a site at the 5' end of the + Rl.*A strand triggering 
for-r.ation of R:iA-conta inir.g particles if coat protein 
is present. If a message containing the A protein 
binding site and the gene for a chir.era .of coat protein 
and a PBD were produced in a cell that also contained A 

15 protein and wild-typa coat prote.i.n (both produced from 
regulated genes on a plasrid) , then* the RHA ceding for 
the chir.eric protein would get packaged. The viral RNA 
replicase gene is not needed because all .ccrponents 
needed for fomation of particles arc encoded in DNA. A 

20 package cor.ptisir.g RNA ' encapsu lated by proteins encoded 
by that Rr.'A satisfies tr.e -ajcr criterion that the 
genetic message inside the package specifies so.T.ethinq 
on the outside. The parricles by theniselves ore net 
viable. After isolating the packages that carry an 

25 SBD, we would n^ed to: 

1. separate the R::a trort the protein capsid, 

2) reverse transcribe the RrJA into DHA, using AMV 
3 0 or :<>tTV' reverse transcriptase, and 

3) use Thorr-M?; n au^t icj s ON'A polymerase for 25 or 
more cycles of Polv~crasc Ch.:\in Reaction^-"' to 
amplify the OllA until there is enough to subclone 
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the recovered qer.ctic r-ossnge i.nto a plasm id for 
sequencing and further vork. ^ i 

I 

Alternatively, helper phage could be used to rescue the t 

5 isolated phage. In one of those '-ays ue can recover a i' 
sec'jence that codes for an S30 having desirable binding 
properties. The In v : t ro ar.pl i f ica tion (SAIK85, 

SCHA36. US Patents ^.tiS:, 202 and •i,683,i05) may be I 
ccnvoniently carried out using a P^rkin-E Imer/Cetus 

10 Thermal Cycler (part nur.ber NSOl-OlSO) and CeneAmp DNA tr'" 

Arplif ication Reagent Kit (NaOl-OO^D) supplied by ^' 

Perkin-Elmer Corp., 761 Main Avenue, Norwalk, CT, f- 
06859-0012, USA. The prir<»i-s used in the Polymerase 

Chai.n Heact ion ^ should be picked so that the osp-y bd ^■ 

15 gene is the part of the reverse-trans formed Dh'A that is !'."• 

ar.Dlified. r' 

P 

Although such a procedure is -uch more cunberscme *;... 
-;• . than use of o;.'A phage, it r.ay ce of interest if:-l) the- 

20 genetic oack^age* of tr.e R.'.'A phage is r.uch more stable ** 
than any DNA phage, 2) the 3D st:-jcture of an R/:a phage 

is known (f2 forns ccyzt^ls inside col i . suggesting f 

that structure dcterr. ira tion of f2 virion .T.ay be [. 

practical), or 3) folding of a large protein inside ^■ 

2 5 cell is desired (thir schcr.e alio-s alnci;t the entire T 

3.5 Kb geno-e of ^32 to be used for chirr.eric coat J' 

prcteir.-PDD) . Use cC rur.ions involving M32 cuan I 

protein, together with --fild-type t^.SZ coat protein, to f 

enc.ipsulate genes dc.-.or.st rates the most primitive L* 
30 systen that cojld be enployod in the presrint invention. 

r 

Although the systen nas certain technical j". 
inconveniences and there: jre is not preferred, it could 

be used. ? 
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Oisplaving IPSOs on Phaaos: ^ 



For a given bacteriophage, the preferred OSP is 
usually one chat is present on the phage surface in the 
largest number of copies, as this allows the greatest j 
flexibility in vnryinq the ratio o: 0SP-IP5D to wild 
type OSP and also g Ives the h i'jhcst 1 ike I ihood of f' 
obtaining satisfactory affinity separation. Moreover ? 
d protein present in only one or a few copies usually 
perforrms csGsntial Cur.ction in morphogenesis or ^ 

infection; mutating such a protein by addition or 
insertion is likoly to result in reduction in viability f 
of the CP. r 



It is preferred that the wild-type osp gene be l 
preserved. The iobd gene fragn^ent noy be insertt'C f'. 
either into a second copy of t.Se recipient osp gone or [ 
into a novel engineereci c so gene. It is preferred that r 
the OSP- iobd gene be placed un-isr control of a f- 
regulated prcfr.oter. Cur process forces the evolution -f. 
of the PBDs derived fron IPDO so that sorte of then 

develop a novel function, •.• i z . binding to a chosen * [ 

target. Placing the gone that is subject to evolution * 
25 on a duplicate gene is cn i.-nitation of the widely- 
accepted scenario for the evolution of protein 

.families. It is now generally accepted tliat qcn*: • [ 

duplication is the first step in the evolution of one L 

protein fro.T. an ancestral protein. 9y having two ^ 

30 copies of a gene, the affected physiological process |- 
can tolerate mutations in one of the genes. This 
process is well understood and documented for the 

globin family (cf^ DICK33, pSSff, and CREia4, pll7- 1; 

125) . . I . 



The preferred OSP for use when tr.e G? is till is 
the gene III protein (see Zxatr.ple 1). 

See. 1.3.3: Choice of rnsprtion site for TPBO in or>P: 

The user r.ust choose a site in the candidate OSP 
gene for inserting a iobd cene fragrient. The coa ts ot* 
most bacteriophage are highly ordered. F i 1 anontous 
phage can be described by a helical lattice; isonetric 
phage, by an icosahodral lattice. Each monomer of each 
major coat protein sits on a lattice point and makes 
defined interactions with each of its neighbors. 
Proteins that fit into the lattice by making some, but 
not all, of the normal lattice contacts are likely to 
destabilize the virion by: a) aborting fcrnation of the 
virion, b) making the virion unstable, or c) leaving 
gaps in the virion so that the nucleic acid is not 
protected. Thus in bacteriophage, unlike the cases of 
bacteria and spores, it is irportanf to retain most or 
all of the residues of the parental OS? in engineered 
OSP-IPDD fusion proteins. 

Association of proteins into dirers, trir.crs, or 
oven larger structures represents yet a.iothr.r .aspect of 
protein binding. For proteins that fern such 
associations, heterologous nixtures of nutant and 
nomal proteins will forn if the nutations have not 
altered the interface bctvcen subunits. For example. 
Ward et aj_^ have shovn that tyrosyl tKNA synthetase 
will form heterodimors when mutant and nor:nal protein 
are aiiowFid to refold togetncr (v;,\rd86). See also 
Hickman and Levy (HICKo3) who studied the raultir.oric 
structures of the Tet^ protein by engineering cells to 
carry two different tet alleles and observing a Tet^ 
phenotype arising from the corspl enentary alleles. They 
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conclude that the Tcn^ protein is r-ultincric. 
Ininunoglobul in Corr.acion depends on the ability of v^, 
dona ins and doniains, each a part of a separately 

synthesized protein, to associate independently of the 
protein sequence in the antigen complemcntar i ty- 
dfetennining regions. In addition, the process of 
immune complcnentac ion depends on the sepa r<^b i 1 i ty of 
the binditvj properties of the complementar ity- 
dcteraining ragions fro.-a the binding properties of the 
constant domains. 

Aud i tor e -Ka rg reaves , US Patents 4,470,925 
(AUDIS4a) and 4,479,395 (AtiDI34b) teaches methods of 
in-^king hybrid antibodies t'lat depend on association of 
different antibody chains. These patents teach that 
alterations far fi-om the internoiecular interface do 
not alter the a£:;cciation. 
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A pre t erred site for insertion of the i nbd gene 
into th^ pha-^e cso gene is one in vhich: a) the IP3D 
folds into its original shape, b) the OSP domains told 
into their "original shapes, aniJ c) there is no 
interference terveen the' tvo dor.ains. it is net 
required that the IPGO and OSP domains have any 
particular spatial relationship; hence the process on 
this invention docs not require viso of the method of US 
Patent '602. 

If there is a 3D node! of the phage that indicates 
that either the ar.ino or carboxy tcrT:inus of an OSP is 
exposed to solvent, then the exposed terminus of that 
nature OGP bccor.cs the prir.e candidate for insertion of 
the jpbd gene. A low resolution :d zcdel suffices. 
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In the absence of a 2D structure, the ftnino and 
carboxy termini of the nature C3? are the best 
candidates, for insertion of the iobd gene. A 
functional fu.T-ion may require arlditional residues 
5 between the IPBD and 03P donains to avoid unwanted 
interactions between the doriains. Randon-scquence DMA 
or DN'A codinq for a specific soq^jonce of a protein 
ho.Tiologous to the IPOO or C3? , con be inserted between 
the osp fraqnent and the i nbd fraq-ent if needed. 

10 

Fusion at a domain boundary within the OSP is also 
a good approach for obtainiig a functional fusion. 
Smith exploited such" a boundary when subcloning 
heterologous DN'A into gene LLI, of fl (SMXT35). 

15 

There are several r.cthods of identifying domains. 
Methods that rely on a.cnic coordinates have br.cr. 
reviewed by Janin and. Chothia fJAi.*I35) . These methods 
use matrices of distances between alpha carbons 

20 t^alpha)' dividing planes (cf. ?.05£S5) , or buried 
surface (RAi^MS-;). Chechia and collaborators have 
correlated the behavior cf r.any natural proteir.s with 
domain structure (according to their definition). 
P.ashin correctly predicted the stability of a domain 

25, ^comprising residues 20G-316 of therr.r.l ysin (Vl'ZkB^*, 
RASHS4 ) . 

Many researchers have used p'jrt.ial proteolysis and 
protein sequence analysis to isolate and identity 

30 stable domains. (Sec, for cxar.ple, VITAS^. POTEST, 
SCOT37, and PAB079.) Pabo ct aL-. Lised calorimetry as 
an indicator that the cl repressor from the coliphagc 
lambda contains two dc-ains; thoy then used partial 
proteolysis to determine the Icc-ition of the domain 

35 boundary. 
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It i:i generally believed that the part of the 
polypeptide c:;3in conpOGing one dor.ain folds alir.oi;t 
independently of the parts conposing other oonains. 
There are natural proteins cor.posed- of tvo or more 
domai'is for which there is strong evidence that 
essentially the so:r.e dcnain occurs r.ore th.in once. Cor 
example ovo-'jcoids and cvo inh ib I tors <ZC0TH7) and 
kaliikrein (CHL'riSo) . Further, tne same dcnain can 
occur in several different proteins (SUDM05, CILBaS, 
anci 3COT5'/ ) . 



If the only structural information available is 
the amino acid sequence of the candidate 05P, we can 
15 use the sequence to predict turns and loops. There is 
a high probability that sore of the loops And turns 
will be correctly predicted f cf . Chou and fasmar, 
(CflOU72)); these locations are also- candidates for 
insertion of the i ob d gene fragment. • 



Sec. 1.3.4: -In Vivo lo'j t i en for Pcguzo-or:p nrne 
Randon D::a rnser-is in factor :?! 1 JT^^ir csj. * 



Alternativoly, a functional insertion slt*» say be 
2 5 dgterTTiined by g'3tie rating a nur.fccrr of rocc.r.binanU 
constructions and selecting the fur.cticr.al strain by 
phenotypic characteristics. Because th^^ OSP-KOD ir,ur:t 
fulfill a structural role in the phage coat, it is 
unlikely that any particular random DN*A sequence 
30 coupled to the i_pbd gene wiil produce a fusion protein 
that fits into the coat in a functicraL way. 
Nevertheless, random UNA inserted between large 
fragments of a coat protein gene and the ipbd gene will 
produce a population that is likely to contain one or 
35 more mcmborc that display the IPBO on the outside of a 
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viable phage. A display probe, sinilar to that defined 
in 1.1.4, is constructed and random D»A sequences 
cloned into appropriate sites. 
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Sec. 2: Choice of TPHD : 



A IP3D may be chosen froni naturally occurring 
proteins or donains of naturally occurring proteins, or 
aay be designed from first principles. A designed 
protein nay have advantages over natural proteins if: 
a) the designed protein is nore stable, b) the designed 
protein is smaller, and c) the ;h:irge distribution of 
the designed protein can be specified more freely. 

A candidate IPBD nust meet the following criteria: 



1) a doirrain exists that will regain stable under 
the conditions of its intended use (the domain may 
coriprise the entire protein th:i^ will be inserted, 
e.c. BPTX) , 

2) knowledge of the air,ino acid sequence is 
obtainable, 

3) knowledge of the identity of the residues on 
the domain's outer surface, and their spatial 
relationships, is obtainable, and 

4) a .-olccule is av.-\ilable having specific and 
high affinity for the ITBO, AfM{IPBD). 

Preferably, the XPDO is no larger than necessary 
because it is easier to arrange restriction sites ir. 
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snaller anino-acid . sequences and because a snailer 
protein minimizes the metabolic strain on the C? or the 
host of the CP. The usefulness of candidate r?SDs that 
r.eec all ' of these requireaents depends on the 
availability of the information disc-jssed belov. 

Inforaaticn about candidate IPBDs that -ill be 
uscu to yuuae the . suitabii icy of the IPSO includes: IJ 
a 3D structure (knowledge strongly preferred), 2) one 
or ir.ore seqpjenccs honolpgous to the I PHD {the more 
hor.ologous sequences known, the better) , 3) the pi of 
the IPBO (knowledge necessary in sci^e cases), 4) the 
stability and solubility as a function of temperature, 
pH and ionic strength (preferably >Lno-.-n to • be stable 
over a wide range and soluble in cc-'ditions of intended 
use), 5) ability to bind r.etal icr.s such as Ca""* or 
Mq** (knovledge preferred; bidding per se, no 
preference), .6) enzynatic activities, if any (>:n3*-ledce 
preferred, activity r er se has u-es but nay cause 
problems), 7) binding properties, if any (knovledge 
preferred, specific binding aljo preferred), £) 
a.vailabil ity of .a molecule having specific and strong 
affinity ( K.j < 10"^^ M) for the I?3D (preferred), 9) 
availability of a nolecule having specific and -ediun 
affinity ( lO"^ M < Kd < 10"*= y.) for the IPBD 
(preferred), 10) the sequence of a zutant of IP30 that 
does not bind to the affinity r.olec'jle(s) (preferred), 
and 11) absorption spcctrun in visible, UV, r.y.S, etc . 
(characteristic absorption preferreij. 

If only one species of Kolec-le h-^.ving affinity 
for IPDO (AfM(IPBD)) is available, it will be used to: 
a) detect the IPDD on the CP surface, b) optinize 
expression level and density of the affinity =olecule 
on the r.atrix (Sec. 10.1), and c) determine the 
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efficiency and sensitivity of the affinity separation 
(Sees. 10.2 and 10.3). As noted above, ho'-ever, one 
would prefer to have available two species of 
AfH(IP3D), one vith high and one vith moderate affinity 
for the IPBD. The species with high affinity --ould be 
used in initial detection and in determining efficiency 
and scnsicivLty (10.2 and 10.3), and the species with 
moderate affinity would be used in cot imi zatibn (10.1). 

There are nany candidate IPBDs, 20 or rore, for 
which all of the above information is available or is 
reasonably practical to obtain, for ey.acple, bovine 
pancreatic trypsin inhibitor (BrTI, * 53 residues), 
cranbin (46 residues) , third docain of cvoaucoid (56 
residues), T4 lysozyne (164 residues), and azi:rin (123 
residues) . Structural infomation can be obtained fron 
X-ray or neutron diffraction studies, NK^, chemical 
cross linking or labeling, node ling frcz known 
structures of related proteins, or from theoretical 
calculations. 20 structural inforriation obtair.ed by .X- 
ray diffraction, neutron diffraction cr (.'MR is 
preferred because these methods allcv localization of 
almost all of t^c atcr.s to within defined lir.its. 

Most of the PBDs derived frc= a PPBD according nc 
the process of the present invention affect residues 
having side groups directed tcvard the solvent.. 
Reidhaar-Olson ind Sauer (P.EID3S; found that exposed 
residues can accept a wide range of a.iino acids, while 
buried residues arc more limited in this regard. 
Surface nutations typically have cnly small effects on 
melting temperature of the PBD, but may reduce the 
stability of the PBD. Hence the chosen I?BO should 
have a high melting temperature (6C^C acceptable, the 
higher the better) and be stable over a wide pH range 
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(8.0 to 3.0 acceptable; 11.0 to 2.0 preferred), so that 
the SDDs derived from the chosen IPBD by mutation and 
selection- through-binding uili retain sutiiciont 
stability. Preferably, the substitutions in the IPDD 
yielding the various PDDs do net reduce the melting 
point o£ the domain below 50°C. Mutations nay arise 
that increase the stability of SoDs relative to the 
IPBD, but the process of the present invention does not 
depend upon this occurring. 

Tvo general characteristics of the target 
Dolecule, size and charge, r*ake certain classes of 
IPBOs more likely than other classes to yield 
derivatives that will bind specifically to the target. 
Because these are very general cha raccerist ics , one can 
divide all targets into six classes: a) large positive, 
b) large neutral, c) large negative, d) small positive, 
e) srr.all neutral, and f) s.-:iall negative. A snail 
collection of IPBDs, one cr a few corresponding to each 
class of target, will contain a preferred candidate 
IPBD for any chosen target. 

Alternatively, the user r.ay elect to engineer a 
G?(IPBD) for a particular target; Sec 2.1 gives 
criteria thnt relate target siie and charge to the 
choico of IPSO. 

Sec. 2.1.1: Influence of target size on choice of IPDD: 

If the target is a protein or other r.acrorcoleculc 
a preferred embodiment of the IP20 is a sr.all protein 
such as BPTI from Bos Taurus (5S residucsj , crar.bin 
from rape seed (•;G residues) , or the third domain of 
ovomucoid from Coturn i x cotu rn i x Jauon ica (Japanese 
quail) (56 residues) (PAPA82) , because targets frcTv 
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this class have clefts and grooves that can accommodate 
snail proteins in highly specific ways. If the target 
is a macroaolecule lac>;ing a conpact structure, such as 
starch, it should be treated as if it were a small 
r.olecule. Extendod r.acronolecu les with defined 3D 
structure, such as coLlaoen, shoujd be treated as large 
nolecules. 

If the target is a small colecule, such as a 
steroid, a preferred enbodir.ent of the IPDO is a 
protein the size of ribonuclease from Bos t ^urus (124 
residues) , ribonuclease from Asoera i I I.us cruzae (104 
residues) , hen egg vhite lysozyne- from C.^ II us gal lus 
(129 residues), azurin from Pseud s.-.on.'^s aorugenosa (123 
residues), or T4 lysozyne (164 residues), because such 
proteins have clefts and grooves into which the snail 
target ir-olecules can fit. The Rrco)chaven Protein Data 
3an>: contains 3D structures for all of the proteins 
Listed. Genes erccding proteins as lar=c as T4 
lysozyne can be canipulatcd by star.card techniques for 
the purposes of this invention. 

If the target is a nineral, insoluble in water, 
one r.ust consider tno nature of the molecular- surface 
of . the mineral. .minerals that have sriooth surfaces, 
such as crystalline silicon, require mediun to large 
proteins, such as ribonuclease, as IP13D in order to 
have sufficient contact area and specificity. Minerals 
with rough, grooved surfaces, such .is zeolites, could 
be bound either by saall proteins, r.uch as SPTI, or 
larger proteins, such as T-\ lysozyne. 



Sec. 2.1.2: Influence of tarrrct ch.Trne en choice of 
IPPD: 
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Electrostatic repulsion between aiolecules of like 
charge can prevent aolecules with highly co.T.plementary 
surfaces froa binding. Therefore, it is preferred 
that, under the conditions of intended use» the IPBD 
5 and the target nolecule either have opposite charge or 
that one of thea is neutral. In so=e cases it has been 
observed* that protein r.olccules bind in such a way that 
like charged groups are juxtaposed by including 
oppositely charged counter ions in the molecular 
.1.0 interface. Thuc, incluGion of couriter ions can reduce 
or eliminate electrostatic repulsion ano the user may 
elect to include ions in the eluants used in the 
affinity separation step. Polyvalent ions are more 
effective at reducing repulsion than nonovale.it ions. 

15 

Sac . 2.1 .3: Other cons Idora t ions in the chiriice of 
I PSO: 

If the chosen IPBO is an enzyme, it ir.ay be 
20 necessary to change one or more residues in the active 
site to inactivate enzyme function. For example, if 
the IPBO were T< lysozyme. and the CP were col i cells 
or M13, ve would need to inactivate the lysozyr.e 
because othervise it would lyse the cells. If, on the 
25 other hand, the CP were Phi>:i74, then inactivation of 
lysozyme =ay not be needed because T-J iysozyme can be 
• overproduced inside' col i ceils -without detrimental 
effects and FhiX17< fotrvs int race 1 lula r ly - It is 
preferred to inactivate enzyme I?3Ds that night be 
30 harmful to the CP or its host by substituting mutant 
a-ino acids at cr.e or nore residues of the active site. 
It is permitted to vary one or rore of the residues 
that were changed to abolish the origin.Tl enzynatic 
activity of the IPBD. Those CPs that receive osp-cbd 
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genes encodinq an active onzyne nay die, but the 
majority of sequences will not be deleterious. 

If *:he binding' protein is intended for therapeutic 
5 use in hunans or animals, the IPBD r.ay be chosen fron 
proteins native to the designated recipient to aini^iize 
the possibility of antigenic reactions. 

See. 3: ChoiCQ cf OCV : 

10 

The OCV is preferably snail, e.g., less than 10 
KB. ■ The size of the OCV affects the stability of the 
OCV and its derivatives, and the copy nur,ber thereof. 
An OCV which is stable, even after insertion of at 

15 least 1 Jtb d::a, is sought. A multicopy OC/ is also of 
interest. It is desirable that cassette nutagenesis be 
practical in the OCV; preferably, at least 25 
restriction cn;:yr:cs are available that do net curt the 
OCV. It is likewise desirable that s i nq le-stranded 

20 rr.utagencc is be practicr»l. Finally, the OCV preferably 
carriers a selectable r.arker. 

If a suitable OCV dees not already cxisc, it r.jy 
be engineered by manipulation of available vectors. 

25 

In the cases of bacterial cells and bacterial 
spores, the bacteriol ch ror.osor.e cculd be used as the 
OCV. Plasnids are, hcwovcr, . prof erred because genes on 
plasnidG are nuch nore easily constructed and nutated 
3 0 than are gonos in t.he bacterial chro.-rioso-e . When 
bacteriophage arc co be used, the csn- i p bd gene nust be 
inserted into the phage gcnone. T.he synthetic cs p- ipbd 
genes can be constructed in sr.all ve::tors and 
transferred to the CP gencr.e when co.T.plete. 
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Phage such as M13 do not confer antibiotic i; ij 

resistance on the host so that one can not select for 



tf;: ceils infected with M13. An antibiotic resistonce gene 

ijij can be engineered into the M13 genome (HINESO) . More •''^-■^ 

5 virulent -phage, such as PhiX174, maJce discernable 

plaques that can be picKcd, in which cise a resistance f:'.;,^;; 
^■'^ gene is not essential; furthermore, there is no room in - 

the Phi:<174 virion to add any new genetic oaterial. 
Inability to include an antibiotic resistance gene is a 
10 disadvantage because it limits the number of CPs that 
r>ij can be screened. 
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^ It is preferrea that CP(IPBD) carry a selectable 

narKer not carried by wtCP. It is also preferred that frNV^ 
15 wtG? carry a selectable marker not carried by CP(IPBO). 



F-r-r Sec. 4: Desicninrj the oso-iobd ceno insert: { 
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Having chc-jen a IPBO. a CP, a strategy fcr getting 
20 the IPriD onto the C? surface, and a cloning vector, we 
now turn to the design of a suit:ibly regulated gene. 
In this section, we design an omino acid sequence that 
LTtJ will cause the IFBD to appear on the CP Gurfacc. whpn it 

*is expressed. This anino acid sequence cay deterwin;? 
25 the entire coding region of the cso- i pbd gene, or it >^ 
-ay contain only the i Phd sequence adjoining 
restriction sites into which random DNA will be cloned 
m (Se=. 6.2). J, 

m ■ . . , ^ '-'^ 

30 We will now consider the transcriptional jr 



V'C: regulation of the o ^n- inbd gene; the design of che D.»fA ^' 

^. -^ . ^ I' . ' 

V encoding of anino acid sequences: the organizotion of f-. 

^'■^^ synthesis; the net hods of DNA synthesis and l.-.*^--; 

v:3 purification; and the actual gene synthesis and \ . ,\ 

;*'^ 3 5 cloning. ^-V^" 
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Thn actual gene may be: a) completely synthetic, I 

b) a composite of nafjral and synthetic ONA, or c) a f 

composite of natural OUA fragments. The iciportant f 

5 point is that the n bd sequent, derived from the i nbd \ 

segxcnt, be easily genetically manipulated in the ways \ 

described in Part III. A synthetic Igbd sog-ent is | 

preferred bocauce it allows greatest control over \ 

y 

placement of restriction SLtes. Prizers complementary ^ 
10 to regions abutting the oro- i ob d ger.e on its 3' flanJc 
and to parts of the cso- irbd gene that are not to be 
varied are needed for sequencing. 



Sec. 4.1 C-?rctic r^^^'u 1 a t ion of the o^o-iribd gene: 

lloj consider regulation of the osn- icbd gene to 
e.nafcle modulation of expression. The f-'o important 
questions are: a) hcv much OSP-IPBD dc we need on each 
CP, and "b) hcv accurately must wo regulate t.^c a.T.ourit? 
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The esisential function of the affinity separation \ 

is to separate CPs that boar pncs (derived from IPOD) i; 

having high affinity for the tarcet from CPs bearing \ 

^ ?3Ds having lev affinity for the target. If the ^ 

25 elution volu.T.e of a CP depends on the numijor of PBDs on I 

the CP surface, then a CP bearing many PCDs vith low i 

affinity, CP(PB3.^), might co-clute with a CP bearing [ 

fewer PODs with high Affinity, CP(PSDs). Assu.ne that \ 
both CP(PDD.^.) and crCPBOg) bind to the column under 

30 some condition, -.such as low salt. If a gradient of 

some soluce, such as increasing salt, changes the \ 
conditions, then ail weak ly-b i .nd i ng PBOs will cease to . |* 

bind before any strongly-binding PBDs cease to bind. \ 

Regulation of the o np-obd gene must be such that all V, 

35 packages display sufficient PBD to effect a gocd I'- 
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separdtion in Sec 15. If chc asiOw.it of PHO/CP had on 
effect on the eluticn volunc of the CP frora the 
affinity natrix, then ve would need to regulate the 
ucount of PBD/CP very accurately. The follovinq 
analysis shovs that there is no strong linear effect of 
IP3D/CP on elution volune and rtssu.Tes only: j) that all 
CPs a:e the sar.c size, b) that interactions between the 
PbDs and the affinity r.dtrix dcninate d i f fcrent i(5 1 
elution of CPs, c) that the system is at equil itariun, 
and d) that all PBDs on any one CP are idenrical. 

If Hp identical PBDs on a CP each have access to 
target r.olecules, and each PBO has a free-energy of 
binding to the target of delta C^^, then the total Jrec 
energy of binding is 

delta Ctj^°t = N'p * delta C^ . 

Delta Cwj will be a function of several parameters of 
the solvent, such as: 1) concentration of icns. ^) pH, 
3) tertpcra ture , 4) ccncentration of neutral solutes 
such c s sucrose, glucose, cthanol, etc. , 5) specific 
ions, such as, calciun, acetate, benzoate, nicotinate, 
e tc . If csnditions are altered during affinity 
-separation so that delta Cw, apprcachcs zero, dolta 
Cj2"-°^ approaches zero :.p tines faster. As delta C^^^'^ 
goes to or above zero, the packages will dissocia.e 
fron the i"=bilizcd target nolccuies and be eluted. 

CPs hocfring r.ore PSOs have a sharper transition 
bef-cen bouni and unbound than packages with fewer of 
the sane PhCs. For equilibriun conditions, the mid- 
point of t.**,c transition is detcrnincd only by the 
solution conditions that bring the individual 
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interactions to zero frec-cncrgy. The nucber of 
PBDs/CP determines the sharpness of the transition. 



It should also be noted that the number of PBDs/C? 
5 is usually influenced by physiological conditions so 
that a samp-o of genetically identical" Cf-'(I'BD)s nay 
conta i n" CPs ■ hav ing different numbers of PnOs on the CP 
surface,- In a population of C?{vg?DD)s .each Pnn 
sequence vill appear on nore that one CP, and the 

10 actual number of PBDs/CP will vary from CP to C? within 
sone range. Within a variegated population of PBDs, 
let PSDj^ be the PBO with maximum affinity for the 
target. If there is a linear effect of nucOjer of 
PBDs/CP, then the CPs having the greatest nunber of 

15 PBOj^ will be most retarded on the column. ■ When we 
culture the enriched population obtained either as an 
effluent from the column or as an inoculum of matrix 
material frcm the- column, , the C?(PBD;^) will be 
amplified and give rise to new GP(P3Dj^)s having varying 

20 numbers of P30;^/CP. Thus the affinity separation 
process of the present invention could tolerate a' 
linear effect of number of PDDs/C? on the elution 
volume of the GP{PDO) unless strong binding to target 
fortuitously causes the PDD to be displayed on the CP 

25 'only in low number. It is extremely unliKely that all 
PBDs that bind co the target will aiso be incapable of 
display in large amounts on the CP surface. 

According to the above analysis, there is no 
30 linear effect on elution volume from the number of 
IPBDs/CP, hence need for highly accurate regulation of 
IPBD/CP is net anticipated. The analysis above assumes 
that CP(lPOD)s arc in equilibrium between solution in 
buffer and bound to the affinity matrix. Rate of 
35 elution may be an important parameter in column 



affinity crtronu tcr; r-iipriy . In bacch olution froa an 
affinity matrix or clution fron an affinity plate, the 
tine that each buffer is in contact vith the affinity 
material nay be an important variable. The density of 
5 affinity oolecules on the natrix is an irr.portant 
variable in optiri icing the affinity separation. 
Because the analysis above is qualitative, in Sec. 10 
ct the preferred oir.toc i rr.cnt we experimentally cptinizo: 
1) the density of IP30 on the CP surface, 2) the 
10 density of affinity noleculos on the affinity patrix, 
3 1 t;he init ial ionic strength, -i J the eluC ion race, and 
5) the quantity of CP/ (volume of natrix) to be loaded 
on the column. 

15 k number of promoters arc known that can be 

controlled by specific chemicals added to the culture 
□odiun. For exar.ple, the 1 a cUV 5 promoter is induced if 
i soprcpy 1 thiogalactoside is added to t^e culture 
nediii.-n, for e.xa.'riplo, at between 1.0 uM and 10.0 r.M. 

20 Hereinafter, ve use ".xri^PUCE" as a generic tcm fcr 
chemical that induces expression of a gene. 

Transcriptional regulation of gene expression is 
best undorstocd and nest effective, so we focus our 

25 attention on the prcnoter. If t r^^nscr ipt ion of the 
os o- inbd gone is controlled by the chemic.:! XTTiOUCF:, 
then the nur.ter of OSP-IPQDs per G? incrcjses for 
increasinf; concentrations of XI.WDUCE until a fali-ofC 
in the nur.ber of viable packages is observed or until 

30 sufficient IPDD is observed on the surface of harvested 
CP(IPBD)s. The attributes that affect cho maximum 
number of C::p-r?aO£ per CP are primarily structural in 
nature. There may be storic hindrance or other 
unwanted interactions between IPSDs if 05P-IH3D is 

35 substituted for every wild-type OSP. Txcescive levels 
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of OSP-IPBD r.ay also adversely affect the solubility or 
norphogenesis of the GP. For cellular and viral CPs, 
as few as five copies of a protein having affinity Cor 
another iEr-ofcilized -olecule have resulted in 
successful affir.icy separations (FERE32a, FE:iE32b, and 
SMITS5) . 

Another consideration of pronoCer regulation is 
that it is useful later to know the range of regulation 
of the osn- iobd . (Sec. 8) In particular, one should 
determine ho'J nearly the absence of XINDUCE le;ids to 
the absence of IPBO on the C? surface; a non-leaky 
procoter is preferred. N'on- leakiness is useful: a) to 
show that affinity of CP f o so- i pbd ).s for AfM(IPBD) is 
duo to the osr>- 1 pbd gene, and b) to allow growt.h of 
G?f 0 5o-nbd i in the absence of XII.'OUCE if the expression 
c: c-.;D-ph d is disadvancageous . The KtcUV5 prcr.oter in 
conjunction- with t.^c LacI*^ represser is a prfifcrred 
exar.ple. 

Sec. 4.2: HNA ycnu^'ncg dor- inn: 



The present invention is not limited to a single 
method of gene design. The following procedure is an 
exarr.ple of cne nethod of gene design tliat fills the 
needs of the present invention. 

Having specified that the amount of IPBD/CP is to 
be experimentally cptinized and that well-studied 
available regulatory nechanisn:s applied to osp-inbd 
gene are sufficient, ve now consider design of a DtIA 
sequence. If the anino-acid sequence of OSP-IPBD is a 
definite sequence, then the entire gene will be 
constructed (Sec. 6.1). If random DSa is to be fused 
to I ?bd . then a "display probe" is constructed first; 
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Zt-^ randon DNA is then inserted to ccr.plete the 
population oi pucntive osp- inbd genes (Sec. 6.2) from 
v^,ich a functio-.^l osp - Ipbd gene is identified by in 
V jvo selection or kin.-ired ccchniques. 

The o?r- :r\.' i gene need not be synthesized in toro ; 
parts of the gene may be obtained ficni nature. One 
nay use any genetic engineering method to nrod'-'ce the 
correct gene fusion, go long as one can ea-jily and 
accurately direct nutations to specific sites in the 
p bd DNA subsequence (Sec. 14.1). In all of the tiethcds 
of rautagcnesis considered in the present invention, 
however, it is necessary that the DNA sequence for the 
o so- irbd gene be different fron any other DNA in the 
CCV. The decree and nature of difference needed is 
deterained by the nethcd of nutagenesis to he used in 
Sec. 14.1. : :' the r.cthod of mutagenesis is to be 
rcplacer.ent ct subsequences coding for the P50 vith 
vcCr.'A, then the subsequences to be nutaceni.zed =ust be 
bounded by restriction sites that are unique with 
respect to the rest of the OCV. If si ngle-st ranried- 
Oy.igonucleotide-dircctcd nutagonesis is to be used, 
then the DNA sequence of the subsequence coding for the 
IrSD nust be unique v^ith respect to the rest o: the 
CCV, 

The sequences of regulatory p^^rts of the gene are 
taken fron the sequences of natur.il regulatory 
ele.T.ents: a) prc.-otcrs, b; Shino-Oaigarno sequences, 
and c) transcriptional terninators. Regulatory 
cler.ents could also be designed froa kncvlodge of 
consensus sequences of natural regulatory regions. The 
sequences of tho::e regulatory elecents are connected to 
the coding regions; restriction sites are also inserted 
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in or adjacent to the regulatory regions to allow 
convenient manipulation. 

The coding portions of genes to be synthesized are 
5 designed at the protein level and then encoded in OKA. 
The amino acid sequences are chosen to achieve various 
goals, including: display of a IPSO on the surface 

of a CP, b) change of charge on a IPBD, and c) 
generation of a population of PBCs fro.-a which to select 
10 an SBD. The ar.biguity in the genetic code is exploited 
to allow optinai placcncnt of restriction sites and to 
create various distributions of aaino aciUs at 
variegated codons. 

Soc. 4.1: Specific O'JA soruence assionrrenc: 

A co.Tiputer" progr^.n r.ay be used to construct an 
ambiguous D:JA sequence coding fcr an a-ino-acid 
sequence given by the user. That is. the DN'A sequence 
2 0 cont^-^ins cedes for all pcr.sible CNA sequences that 
produce the stated anino acid sequence. The codes used 
in the ar.biguouc DtiA arc shown in Table 1. An e>canpic 
of an anDiguous C;:iA sequence is given in Table 3. 
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25 The ui-.er sxipplios lists of restriction en2yr.es 

that: a) do net wUt the OCV, and b) cut the OCV only 
once or twice. Kor each crzyr.e the program rc.=ids: a) 
the nanc, b) the recognition sequence, c) the cutting 
pattern, and d) the nar.cs of suppliers. The arJaiguous- 

30 ON*A sequence cooing for the stated a-ino aciU sequence 
is ovamincd for places that recognition sites for any 
of the given enzyr.es could be created without altering 
the a ^li no -acid sequence. A master table of enzy.nes 
could be obtained fro»Ti the catalogues of cnzyne 

3S suppliers such as the suppliers liscod in T^ble 4 or 
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other sources, such as Roberts' annual rovie*- 
resCriction cniynos in Nucleic Acids Research. 
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Each potential rcc09nition site causes a rpcord 
sir. i lor to the following to be writtcti 



Hind 1 1 ICS, B 
Protein seq 

aa 4c 
possible DtiA :Aj\r 



M,I,;J,P> Loc»0 T r> D-^Ij Pir--n Cu t" e 1 ect ive 
: k - s - 1 - w 



cutter 
result 



A 

: AAA 



^ I 5 I C Cut 4 1/6 

ACy|TTr!TCG 5 ' N.'.'I.'N'NA ACCTTNVNtiH3 ' 

AGC I TT 3 ' NNNf.'NTTCCA ANKNNNS ' 

ACClTTr 



15 The top line identifies the entiyme, H 1 nd III in 

this example, and the supplier (throuqh codes qivcn in 
Tabic 4>; "Loc«0" indicates that recognition begins 
with nucleotide 0; **t=S** indicates thjt the antisenr-e 
(top) strand of DrfA is cut after base 9; •'Q---13" 

20 indicates that the sense strand (or bottom strand, not 
shovn except in the dsCNA on the right) is cut between 
bases 13 and U (reading loft to right). "Dir=n" 
indicates that reco^-icion is "nornal". liiad lit 
recognizes palindromic sequoncos. as do nc:it 

25 restriction cnzyr.cs. Sor^c enzynos have asy::ij7.o.t r ic 
-recognition, however, and cut to one *jido; for thcce 
enzymes, the recognition could be "notr.al" or 
"reversed" depending on whether the enr.yrie cuts to tho 
right or left of the recognition site. Hare 

30 ;:nanbiguous stretches that require certain rorjtriction 
sites are labeled as "obi ig.itcry" ; those th^t are 
elective are so labeled. 



The second and third lines show che arcino-acid 
3 5 sequence and residue numbers for which this region o; 
DI.'A codes. The notation "Cut ? l/ii" indicates that 
this is the first of six ponr.iblc III sites. 
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The fourth Line shows the antisense strand of ONA 
coding for the desired amino-acid sequence. The fifth 
line shows the recognition pattern of the enzyme. The 
sixth line shows the consensus between the DtiA sequence 
.. 5 required by the anino acid sequence and the DMA 
sequence recognized by the restriction enzyne. The 
dsDNA to the right shows the ends generated by the 
restriction digestion. 

10 The program also prints a table sumnarizing the 

possible sites. An exariple of such a sunnary of 
potential sites is fo;jnd in Table 5. 

The choice of elective restriction sites to be 
15 built into the gene is determined as follows. 



The goal is to have a • series of fairly unifornily 
spaced unique restriction sites with no r.ore than a 
preset r.a:<ir.aa nur.be r of bases, for example 100, 
between sitSs. Unless required by other sites, sites 
that are not present in the parental CCV are not 
introduced into the designed gene more than once. 
Sites that occur only once or twice in the parental OCV 
'are not introduced into the designed ge.ie unless 
necessary. 



First, each enzyne chat has a unique possible site 
is picked; if two of these overlap, then the better 
enzyne is picked. An cnzyne is better if it: a) 
generates cohesive ends, b) has unar. biguous 
recognition, or c) has higher specific activity. Next, 
those sites close to other sites already picVied are 
elininatcd because many sites very close together are 
not useful. Finally, sites are chosen to raininize the 
size of the longest piece between restriction sites. 
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The aziJiq'JVty of the oriA bef^een the restriction 
sites is resolved f rc.a the folioving considerations. 
If the given anino acid sequence occurs in the 
5 recipient orranisn, and if the DNA sequence of the gene 
in the orgar.is- is >:ncv;i, then. preCerably, ve riaxinize 
the dif fere-cos ::Gtvoen the engineered and natural 
genes to ninizize the potent ia 1 for rccn.-abina t ion. In 
addition, the following codons are poorly translated in 

10 ro\ i ^nd, therefore, ^re avoided if pcsciblc: 

cta(L), cga (R) / egg (R) , and agg (R) . For other host 
soecies, different cod on restrictions vould be 
appropriate. Finally, long repeats of any one base -ire 
prone to citation and thus are avoided. Balancing 

15 these consider^i tions , -c cun design a Or^A sequence. 

Sec. 5.1: C rr^n izrit irn of none synthesis: 

New ve consider vays to divide the synthesis of 
20 the desigr;5d cene i.7to r.anage.^.ble seg.-nents. The 
present invention is not tinitcd as to how a designed 
0:.*A sequence is divided for easy synthesis. The 
following procedure is an exortple o: how -,uch. synthesis; 
might be canagod. 

25 

An estab > is.-ed r.ethod is to synthesize both 
scrinds of the entire gene in overlapping ceg-ents cf 
20 to SO nucleotides (n:s) (TMERS3) . Below ve provide 
an alternative r.Gth=d that is more suitable for 

3 0 synthesis of vgD::A. This sic t hod is s:r.ilvir to -methods 
putrrijncd by OliphJnt ot (0L:P3G and 0L:?87) and 

A'jsubel /lU (ALTSUST). Our adaptation of this siethcd 
differs fror. previous r.cthods in that wc: a) use two 
synthetic strands, and b) do not cut the e.>c tended DI.'A 

35 in the niddle. Our goals arc: a) to produce lonqev 
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pieces of dsDNA Chan can be synthesized as ssDNA on 
commercial DNA synthesizers, and b) to produce strands 
complementary to single-stranded vgDMA. By using two 
synthetic strands, we remove the requirement for a 
palindromic sequence at the 3' end. 



DNA synthesizers can currently produce oligo-ncs 
of lengths up to ICO nts in reasonable yield, ^ott\ - 



r 

k- 

100. The parameters (tho length of overlap needed t 

10 to obtain efficient annealing) and Ng (the nu.Tiber of y 
spacer bases needed so chat a restriction enz>Tr.e can f- 
cut near the end of blunt-ended dsDNA) are cietermined . t- 
by DNA and enzyme chemistry. Ny = 10 and Mg = 5 are i- 
reasonable values. Larger values of N'^. and .V^ arc 
15 allowed but add to the length of ssDNA that riust be f 
synthesized and reduce the net length of dsD.'^A that can L- 

r: 

be produced. ff! 

c 

Let Al be the actual length cf dsD::.\ to be 
20 synthesized, including any spacers. A^, i^'-'-st be no . % 
greater than (2 Mpjip^ - - Let Q.j be the nurier of jr- 
nts that the overlap window can deviate from center, t: 



25 Q„ = (2 Mf5.^A - t'w - ^0/2 . |. 

i. 

is never negative. It is preferred that t!ie f-o 
10 fragmcnt'j be approximately the same length so that the 

amounts synthesized will bo approximately equal. This p 
preference may be overridden by other considerations. f 

r' 

The overall yield of dsDr.'A is usually dominated , by the : 
synthetic yield of the longer oligo-nt. j - 

" . : : f 

Kc use the following procedure to generate dsD.'.'A I' - 

of lengths up to (2 Mqija - nts through the use of 'I ' 



Klencv frogr.enc to extend oynthccic Dit'A rrcg.T.cncs 

that are not nore thon Mqjj;^^ nts long. When a ptir of 
long oligo-nts, conpler:en to ry for nts at thc.'.r 3' 

ends, are annealed there uill be a free 3' hydroxy 1 and 
a long acOttk chain continuing in the 5' direction on 
either cido. v/e will refer to this situation as a 5' 
superovcrfi'jng. The procedure conpriscs: 

1) picking a non-pal ind ronic subsequence of N,^ tc 

nts near the center of the dsDMA to be 
synthesized; this region is called the overlap 
(typically, U.^ is 10) , 

• 2) synthesizing a ss DUA molecule that conpriscs 
thot part of the anti-sense strand from its 5' end 
up to and including the overlap, 

3) synthesizing a ss OUA nolecule that cccprises 
that part of the sense strand froa its 5' end up 
to and includinc; the overlap, 

4) or.nealing the two synthetic strands that are 
cor.ploriont;:ry throughout the overlap region, and 

5) extcn'iing both supcrovcihangs wit.h Klenow 
fragment and all four dcoxynuclcot idc 
triphospt^ates . 

Dccausc Kd;ia is not rigidly fixed at 100, the 
current limits of 100 2 M^.,;^ - N^.) nts overall and 
100 in each fragr.ent are not rigid, but can be exceeded 
by 5 or 10 nts. Going beyond the linits of 190 and 100 
-ill lead to lover yields, but these may be acceptable 
in certain cases. 
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Restriction en z yr.es do rot cut well at sites 
closer than dbouC five base pairs fror. the end of blunt 
ds- DMA fragments COLIP37). Therefore nts (uith tl^ 
typically set to 5) of spacer arc added to ends that we 
intend to cut with a restriction enzyne. If the 
plasrnid is to be cut vith .a blunt-cutt i nq cnzyac, then 
-e do not u-J.i any i;pacor to the cor rcspcmd i nq end of 
the ds DNA frag.-ent. 

To choose the optir.um site of overlap for the 
ol igo-nt fragments, first consider the anti-sense 
strand of the DN*A to be synthesized, including any 
spacers at the ends, written (in upper case) fron 5' to 
3' and lo ft- to-right . ff . R . : The nt long overlap 

vindcw can never include bases that are to be 
variegated. tL_3_i_L The U-j nt long oveirlap should not be 
pdlindrcnic lest single DNA nolocules pri.^e theaselves. 
Place a N.j nt long window as close to tl^e center of the 
cnti-sensc sequence as possible. Check to se-? rather 



one or .-no re ccdons within the window can be 



cr.il ngc-c 



--d t = 



incr-^ase the CC content without: a J destroying a needed 
restriction site, b) changing anino acid sequence, or 
c) making the overlap region p^i 1 ind ror.ic . If possible, 
change scr.c AT base pairs to CC pairs. I: the CC 
content ot the window is less than 501, slide the 
window right or left as nuch as Q.^ nts to ridxinize the 
nu-.ber of C's and C's inside the window, but without 
including any variegated bases. For each trial setting 
of the overlap window, naxinizc tl;e CC content by 
silent ccrion changes, but do not destroy wanted 
restriction sites or r.-akc the overlap pa 1 indrcnic . If 
the best setting •"-till has less than bO^ CC, enlarge 
the window to t''r^*-2 nts and plai:c it within five nts of 
the center to obtain the naxinum GC content. If 



fr 



•••V'.-: v: -ri^\ -^i^v 
r. ut Til J* -fcrtaarfftifTMiSM 



enlorqinq cho windO'^ one or t'-fo nts viH increase the 
CC content,. do so, but do not include v.iricqated bases. 



Underscore the anti-sense strand fron the 5' end 
5 up to the righc edqe of the vindow. Write the 
complenentjry sense sequence 3'-to-3' and loft-to-right 
and in lower case letters, under the anti-sense strand 
start inq at the left cJqe of the ■-in-Jo--f and cont inu irq 
ail Che way to the right end of the anti-sense strand. 

1 0 

We will synthcr.ize the underscored anti-sense 
strand and the part of the sense strand that we wrote. 
These two fraqnents, cor.ple.-:ienta ry over the length of 
the window of hiigh CC content, are' nixed in equi.-nol^ir 

15 quantities an.'i annealed. These fragnents arc .extended 
with Klenov fragment and all four dcoxynucleot iris 
triphosphates Co produce ds- blunt-ended OUA. This Dt.'A 
can be cut with appropriate restriction enzynes to 
produce the co^4esivc ends needed to liqatc tho fraqncnt 

20 CO other D.'iA. 

Sec. 5.2: r::A svnthnr.is or-.l nu r i 1 i-c^ t ion r.c-thods : 

Thi-j present invention in not linitad to a.iy 
25 particular method of D::a synthesis or construction. 
The- following procedures exe.T.plify one way to ochicv. 
the -joals of the present invention. 

ONA is synthesized on a Miiliqen 7500 DNA 
30 synthesizer (Hilligcn, a division of MiUi?ore 
Corporation, Bedford, KA) by standard procedures: 
Software to control the synthesizer and to keep records 
of each synthesis is .".upplied by Milligcn. 

•3 5 The follcving reagents are supplied bv Milligen: 
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1) IH-tetrazolc in acetoni trilc , 

2) 3'* (v/v) dichioroQcctic acid in 
dichloroRC thane , 

5 3) Acetic anhydride in 2,6 lutidine/aceccni t r i le 

(l:l:8), 

4) 6. SI di-e-hylaninopyridine in ace ton i t ri lo , 

5) 0. IM iodine in 2,G 

lut idine/wa ter/ tecrahydrof uran (8:8:84) , 
n 6) 31 (v/v) cr iethylamine in acetonitrile , 

7) DMT-dAJcnosine(3z) cyanoethylphosphorara id ite 

8) DMT-dCyt idine ( Dz) cyanoethylphosphoraraidi te 

9) DMT-dGuanosina ( iSu) cyar.c.' fchylphosphoranid ite 
10) CMT-dTh'/riidinccyanoethylphosphorariid ite 

15 11) Acetonitrile, anhydrous 

TetrazoJ.o and acetonitrile are stored over 
molecular siover. to sequester water. 

20 Phor.pnoraniditcr- are dicisolved in anhydrcur^ 

acetonitrilb (Hiliigen/ at 0.1 g/r.l . All other 
acetcnitrile used in the syntheses is "Low-vater 
Acetonitri lo" supplied by J. T. Dalcer Checical Ccr.pany 
Thillipsburg, NJ) . Synthesis columns containing 

2S 'supports chorned with an initial base for eich of A. C. 
Q, and T arc obtained fror. MiUigcn in two types, hiqh- 
loadinq and lov-loading. High-loading colunns are used 
for syntheses of oiigo-nts containing up to 6C bases 
and contain . betveon 35 and 70 nicrotr.olcs of anidite/g 

30 of support. The cxoct amount *'^ries fron lot to let. 
[.ow- lo.-id i ng columns containing between 4 an:l 7 
nicrotr.olos acidir.e/g support are used for syntneses of 
oligo-nts containing 60 bases or tr.ore. 



9G 

The Milligcn 7S0O has jevcn vials iron which 
pnosphoram id i tos n*^y be taken. /lorr.ally, the firsc 
fcur contain A. T, and C- The cc-ier three vials 

ray contain unusuol bases such as iricsine or mixtures 
5 of bases, the so-caLlod "dirty bottle". The standard 
software allo-s progranned mixing of two, three, or 
four bases in oquirriolar quantities. 

When a 'jynchcsis is conplcte, tho C:.*A is ro-:oved 
10 froD the support by incubating the supports in 1 ml of 
fresh 2S-3C-; Gn.T.cr.iura hydro: ids solution (CM telenet;, a 
division of EM Industries, Inc., Cherry Hill, r;j) for 
15 hours at 50 degrees C. The solution is dried under 
vacuum and the D!.'A rcsuspendcd in 200 micro I iters of 
15 IIPLC-grade water ( BaKer-Analy zed Reagent ^'^^ , J.T. 
Elaher Chemical Co.) and is purifie'J by h iqh-pressure 
liquid chroiTrt tcgraphy (HPLC) or PAGc. 

with lov-loading supports, a '>5-basc-lcng oligo-r.t 
2 0 is typically obtained at 1-2 ^ of c.^core t ica I yield, 
j . 9 . 10 ug; a lOO-bose-long oliqo-nt is typically 
obtained in 0.5'; of theoretical yioLd, i.e. 5 ug. With 
h igh-ioad i nq supports » 1 r.g of a 20 -base- Icn:; oligc-nt 
Ls typical Iv obtained. 

25 

The present invonticn is not lir.ited to any 
particular net hod of purifying r.::A for genetic 
engineering. HPLC is used for noth oliga-nts and 
fragments of several kb. Alternatively, ag-irose gel 
* 30 electrophoresis and e 1 cct roe lut ion on an lai device 
(International a iotechnol og ies , Inc., '.'ew Haven, CT) is 
used to purify large dsONA fraq.Ticnts. For oligo-nts, 
f'ACE and c loctroelu t ion with an Epigcne dev ice ' ( Epigene 
Corp., Daltinore, MO) are an alternative to HPLC, One 
35 alternative for 0:»A purification- is HPLC on a Waters 
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(division of Miilipore Corporation) HPLC syscen using 
the ConPak("^> -TAX colur.n. A sar.ple of 100 picograms 
(pg) to 10 'jg can be leaded and recovered in 101-80% 
yield. The recovery varies vith the. size and 

concencrat icn oc the D.'.'A, and whether it is single or 
double stranded. A N'A?5 column fro.-a Pharmacia (Sveden) 
is used to cosalt DN'A elutcd from the CenPak colunn. 
After passage over the N'APS colunn, the ONA solution is 
vacuun desiccated. 

Sec. 6.1: C!cr. :r.q of Kr'.cvn 05?- 1:: be cene into OCV: 

■In this section, -e clone the o5 n- iobd gone or the 
display prcfce that wo have rtysigned. In the preferred 
nethod, the synthetic gene is constructed using 
plasniids thac are tr:insfornGd .into bacterial cells by 
standard r.ethccis (MANIsr, p2S0) or slightly -odified 
standard -ethcds. Alternatively, CN'A fragnerts derived 
fron nature are cperably linked to other fragrzents of 
d::a derived ircn nature or to synthetic DMA fragments. 
In cost cases the prorerrcd -ethod, gene synthesis 
involves construction of a series of plasnids 
containing larcer and U>ri^or seg r;er.t5 of tne ccnplete 
gene. Each plasnid that contains a nevly added portion 
of the oso- 1 rV:! gene or of the display probe is tested 
by restricticn digestion. Plosniids having the expected 
restriction cinestion pattern are sequenced in the 
region of tr.e latest alteration to confira tho 
synthesis . 

If, for ccnvcnicnce, sr.all plasniics vere used for 
gene synthesis, the corplcte oso- icbd gene or display 
probe is succlor.ed into the OCV at this point. 
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Sec. f>.2 ClonincT ot' n^ndon QUA fPctontial osr^ Into 
Displ.TV Probe: 

I C random DriA and phenotypic celecticn or 
b screening are used to obtain a CP(IFSD). then ve clone 
random DNA into one of the restriction sites that was 
designed into the display probe. 

The randoa OUA. .-nay be obtained in a variety oC 
10 ways. Degenerate synthetic DMA is one possibility. 
Alternatively, pseudorandcr. DNA nay be taken from 
nature. If, for oxanple, an Sph I site (CCA7C/C) has 
been designed into the display probe at one end of the 
vpb^ fragment, then wc would use Kla III (CMC/) to 
15 pjrtially digest sorne D:.'A that contains a wide variety 
of sc^iuenccs, generating a vide variety frag::ents with 
CATC 3' overhangs. Prefcrai?ly, the display probe is 
designed with different restriction sites at each end 
of the irbd gone so that rardon C.'.'A can be clcred at 
20 either end at the user's discretion. The genc-e of an 
-organism would bo a suitable source of DN'A vith high 
sequence diversity. 

A plasnid carrying t.^.o display probe is digested 
25 with the appropriate restriction cnzyne and the 
fragmented, rando.-n D:-*A is annealed and ligated by 
standard methods. The ligated plas.-ids are used to 
transform cells that aro grown and selected for 
expression of the ant ib ict ic- res ista.nce gene. ?Ias=»id- 
30 bearing CPs arc then selected for the d isplay-of -I PBD 
phcnotypc by the procedure given in Sec. 15 of the 
present invention using Af>:(rPBD) as if it vere the 
target. Sec. 15 is designed to isolate CP{?32)s that 
bind to a target fron a large population that do not 
35 bind. Use of the procedure of Sec. 15 to isolate a 
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genetic construction that leads to the display of a 
single type ot IPBD is different froa the designed use 
in one inportant way: any CP that displays the IPDD 
viU bind tightly and CPs that do not display IPBO wiU 
5 not bind, hence any reasonable amount of AfM(rP3D) on 
the natrix will identify a successful clone. 

As an alternative to selecting CP(irBD)s through 
binding to an affinity colunn, we can isolate colonies 
10 or plaques and scrcer through use of one of the methods 
listed in Sec. 8 to idcntiTy clonal isolates that 
display IPUD on the CP outer surface. 



Sec. 7: Harvest of CPs : 

After transforming cells with ligatcd cloning 
vectors, we first grow the CPs in non-selective 
conditions to allow expression of the antibiotic- 
resistance narKers on the cloning vector. After a 
grow-out, we apply selective pressure to V.iH 
untransf ormed cells, 

CPs are harvested by r.ethods appropriate to the CP 
at hand, generally, ccntri Tugat ion to peliccize CPs and 
resuspension of the pellets in sterile medium (cells) 
or buffer (spores or phage). 

Sec. Z: V r?r i f ic n t icn of Dii -Pl av St ratcgvj. 

The harvested packages are now tested to determine 
whether the IPOD is present on the surtace. In any 
tests of CPs for the presence of IPBD on the CP 
surface, any ions or cofactors known to bo essential 
for the stability of IPBO or AfM{IPBO) must be included 
at appropriate levels. The tests can be done: a) by 
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affinity labeling, b) e n z y ma t i c a 1 1 y . c) 
spcctrophococecrically, d) by affinity separation, or 
e) by affinity precipitation. The AfM(IPFlD) in this 
step is or.e picked to have strong affinity 
(preferably, < 10'^^ M) for the IPBO molecule and 

little or no ALiinity for the wt.GP. For cxanple, if 
BPTI were the ITDD, trypsin, anhydrotrypsin, cr 
antibodies to BPTI could.be used as the AfM(BPTI) to 
test for t^e presence of BPTI. Anhydrotrypsin, a 
trypsin derivative with serine 195 converted to 
dehydroalanlne, has no proteolytic activity but retains 
its affinity for 3PTr (AKOH72 and HUBE77) . 

■ l^r.-cferably, the presence of the IPBD on the 
surface of the CP is deaonstrated through the use of a 
soluble, labeled derivative of a AfM(IPDD) with high 
affinity for IPBD. The label could b»i: a) a 
radioactive ate.-?, such as ^^5^^ tj) a chemical entity 
such as biotin, or 3) a fluorescent entity s.ich as 
rhodanine or fluorescein. The labeled derivative of 
AfM(IPDD) is denoted as AfM(IPBD)*. The preferred 
- procedure is: 



1) mix Af::{IPDO)' with CPs that are to be tested 
25 for the presence of IPED; conditions of mixing 

should favor binding of IPBO to Af M ( I POD) * , 

2) separate CPs from unbound AfM(IPDD)* by use of: 

a) a r:olccular sizing filter that will pass 
30 Af.M(IFBO)* but not CPs, 

b) centri fugat ion, or 

cj a ,-olecular sizing colu.-nn (such as 
Sepharose or Scphadcx) that retains free 
Af:<(IPBO)* but not CPS, 
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3) quantitate the AfXilPBO)* bound by CPs. 



Alternatively, if the IPBD has a known biochemical 
activity (enzynatic or ir.h Lbitory ) , its presence on the 
CP can be verified through this activity. For exasiple, 
if the IPBD were BPTI, then one could use the 
stoichiometric inactivaticn of tr-/?sin not only to 
demons trace the pre^JenCO oX oFVl , but also to 
quantitate the amount. 

If the IPBD has strong, characteristic absorption 
bands in the visible or UV that are distinct from 
absorption by the wtGP, then another alternative for 
neasuring the IPBD displayed on the CP is a 
spectrophotocetric r.easurenent . For example, if IPBD 
vere azurin, the visible absorption could be used to 
identify CPs that display asurin. 

Another alternative is to label the CPs and 
neasure the a.-ount of label retained by ianobilized 
AfMtlPBD) . For exarple, the CPs could be qrovn -with a 
radioactive precursor, such as ^^p^^ or ^H-thynid ine , 
and the radioactivity retained by i-.-obilized AfM(IPSD) 
raeasured. 

Another alternative is to use affinity 
chromatoqraphy; the ability of a CP be.^rinq the IPBO to 
bind a matrix (cl Sec. 15.1) that supports a AfM(:?DO) 
is measured by reference to the wtCP. 

Another alternative for detcctir.q the prese.nce of 
IPBD on the CP surface is affinity precipitation. 
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If random OEM has been used, then the procedures 
of Sec. 15 are used to obtain a clonal isolate than has 
the display-of-IPDD phenotype. Alternatively, clonal 
isolates nay be screened for the d ispluy-of -I?BD 
phenotype. The tests of this step are applied to one 
or .T.ore. of these clonal isolates. 

If no isolates that bind to the affinity r.olecule 
are obtained we take corrective action as disclosed in 
Sec. 9. 

If one or nore oc the tests above indicates t.-at 
the IPOD is displayed on the CP surface, ve verify that 
the binding of nolecules h.-^ving known affinity for I?5D 
is due to' the chimeric osp-irbd gene through tr.e use ot 
standard genetic and biochentical techniques, such as: 

1) transferring the" osozlIoM qcne into the p.^rent 
CP to verify that os2.:Liarjl confers binding. 

2> deleting the o^:c-it>^d gene fron the isolated C? 
to verify. that loss o: oro-iobd causes loss o: 
binding, 

3) showing that binding of CPs to AfM(IPBD) 
correlates with (XINDUCEl (in those cases that 
expression of o sn-inbd is controlled by 
(XINOUCE]). and 

4) showing that binding of CPs to AfM(IPCOJ is 
specific to the innobilizcd Afr^tlPBD) and not to 
the support matrix. 
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Variation of: a) binding of CPs by soluble AtM(I?9D)*, 
b) absorption caused by IPDD, and c) biocheniicai 
reactions of IPBD are linear in the anount of IPBO 
displayed. Presence of IPDD on tlie CP surface is 
5 indicated by a stronq correlation between [XIUDUCE] and 
the reactions that are linear in the anount of IPBO. 
LeoJciness of the promoter is noc likely to present 
problems of high background with assays that are linear 
in the amount of IPBD.. These^ experiments may be 

10 quicker and easier than th^s genetic tests. 
Interpreting* -che" effect of (XINDUCEJ on binding to a 
(AfH(IPBD)l colu,-:in, however, may be problematic unless 
the regulated promoter is completely ' repressed in t.he 
absence of ( Xir.'DL'CE ] . The affinity retention of 

15 CP(IPQD)s is not linear in the number of IFBOs/C? and 
there nay be, for example, little phenotypic oifference 
between CPs bearing 5 IPBDg and CPs bearing 50 IF30s. 
The demonstration that binding is to AfM(irnO) ir.d the 
genetic tests are essential; tho tests with Xlt.'OL'CE: are 

20 optional. 

We sequence the relevant j p'cd gene fragment from 
each of several clonal isolates to determine the 
construction . 

25 

We establish the maxir.um salt concentration and pH 
range for which the CP(IPnD) binds the chosen 
AfM(IPBD) . This is preferably done by measuring, as a 
function of salt concentration and pH, the retention of 
30 AfM(IPBD)* on molecular sizing filters that pass 
AfM(IPOD) • but not CP. 

If the IPDD is displayed on the outside of the CP, 
and if that display iu clearly caused by the introduced 
35 osn- irbd gene, we proceed to Part II, otherwise we nust 
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^^.^ Sec. 9: Pnr^octim th(? Di splay Svston: 



o o i:. 

I. 

10 4 

analyze the result and adopt appropriate corrective 



If we have attempted to fuse an i nbd fraqaent to a tV''^ 
natural or.p craq-er.t, o-jr opt i one; are : l. '- 

'^•'^^ 1) pick a different fusion to the same by 



b) keeping nore or fewer residues trora os2 in 

r .'J 

the fusion; for exariple, in increments of 3 [.-J 
or^residucs, 
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If 



d) trying a predicted loop or turn position, i;.^ 



measures . 



a) using opposite end of ozo . 



c) trying a Knovn or predicted domain 
15 boundary. 



25 1) choose a different relationship botueen ipbd 

fragment and randcn D!IA f i nhd first, random DNA 
second or v ice versT ) , 



35 



2) pick a different oso . or ! 

i ■ 

2 0 3) switch to random Di'*\ method. f.;.-; 



If we have just tried the random DNA method |y 

I: 



unsuccessfully, our options are : 



2) try a different degree of partial digestion, a '] 
different enzyac for partial digestion, a ^. /; 



different degree of shearing or a different source 



of naturul DN'A, or ^.-.'^ 
3) switch to the natural OGP method. 



r.:-' 




If oil reasonable OSPs of the current CP have been 
tried and the random 0U\ method has been tried, both 
without success, we pick a new CP. 

5 Sur.narv of Part I: 

In Part I, wc have constructed a CPCIPBD). 
Although the target r.aterial is not picked until Part 
III, we have already discussed the general properties 

10 of targets that influence the choice of IPBO. The user 
nay use the first CP(IPaD) as the starting point for 
design and construction of other CPs: CP(IPBDl), 
CP(IP3D2), fitc. The different IPDOs. night differ in 
charge and size in such a way that, for any target, at 

15 least one of the CP(IP30)s will be appropriate as a 
starting point to develop a protein that will bind to 
that target. 

Part II 

20 

Sec. 10.0: Affinity Sop-Tration Means: • 

In Part II ve optimize an affinity separation 
system that will be used in Part III tc enrich a 
25 population of G?(vgPBD)s fcr those CP(PBO)s that 
display PODs with increased affinity for the target. 

Affinity chromatography is the preferred means, 
but FACS, electrophoresis, or other means nay also be 
30 used. 

See. 10. 1: Ontinization of Affinity ChrcT-atoorap hv 
Separat ion : 
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For linear gradients, elution volume 4p.d eluant 



concentration are directly related. Changes in eluant 
concentration cause CPs to elute from the colur.n. 
Elution volume, however, is nore easily measured and p.; 
5 specified. It is to be understood that the eluant 
concentration is the agent causing CP release and that 
an eluant concentration can be calculated frczi an \.. 
elution vol.ucie and the specified gradient. g 

10 Using a specified elution regime, we compare the 

elution volunes of GP(IPBD)s with the elution volumes 
of wtCP on affinity columns supporting AfM(IPBD). 
Coraparisons are made at various: a) amounts of IP3D/CP, 
b) densities of Af M ( IPBD) / (volume of matrix) (DcA.McM) , 

15 c) initial ionic strengths, d) elution rates, e) 
amounts of CP/(volur.e of support), f) pMs, and g) 
temperatures, because these are the parameters nost 
likely to affect the sensitivity and efficiency of the 
separation. We then pick those conditions giving the 

20 best separation. 

We do not optimize pH or temperature; rather we 
'record optinal values for the other parameters for one 

or more values of pH and temperature. The pH used must 
25 be within the' range of pH for which C?(IPBD) binds the t 

AfM(IPBD) that is being used in this step. The rv 

conditions of intended use, specified by the user (Sec. 

11), may include a specification of pH or temperature. ^ 

If pH is specified, then pH will not be varied in 
30 eluting the colu.nn (Sec. 15.3). Decreasing pK say, ^ 

liowever, be used to liberate bound CPs from the r:-atrix. ^. 

Similarly, if the intended use specifics a tex-perature , |; 

wo will hold the affinity column at the specified e. 

temperature during elution, - but we might vary the h 
35 temperature during recovery. If the intended use ^ 
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specifies the pH or tcr.pcra ture , then we prfifcr that 
the affinity separation be optimized for all other 
paraneters at the specified pH and ter.peraturo . 

5 In the optinization devised in this step, we 

preferably use a nolccule known to have noderate 
atfinity for the IP3D (K^^ in the range 10"** M to 10"^ 
M) , for the following reason. When popuiaticns of 
CP(vqPBDJs are fractionated, there will be roughly 

10 three subpopulations : a) those with no binding, b) 
those that have sone binding but can be washed off with 
high salt or low pH, and c) those that bind very 
tightly and nust be rescued in situ . We- optiaize the 
parameters to separate (a) from (b) rather than (b) 

15 Crora (c) . Let PBD^ be a PBO having weak binding to the 
target and PBOg be a POD having strong binding. Higher 
DoAMoM raight, for exanple, favor retention of CP(PBDy) 
but also oake it very difficult to elutc viable 
CP(?BOs) . We will op':iini;c the affinity separation to 

20 retain CP(par\^.) rather tnan to ellow release of 
CP(PBDs) because a tightly bound CPfPEDs) can be 
rescued by in s i tu growth. If we find that DoAMoH 
strongly affcc'.s the elution volume, then in part III 
we nay reduce the anount of target cn the affinity 

25 colunn when an SBD has been found with .-oderately 
strong affinity (K^^ on the order of 10"^ M) for the 
target. 

In case the promoter of the osj- igbd gene is not 
30 regulated by a chemical inducer, we optiiaise OoAMoM, 
the elution rate, and the amount of CP/voluce of 
matrix. If the optimized affinity separation is 
acceptable, wc proceed. If not, we r.ust develop a 
means to alter the anount of IPBD per CP. Anong CPs 
35 considered in the present invention, this case could 
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ari:;e only for spores because regu i.itob la pro.-aot.crs are 
available for oil otMer syi;cens. 

It the amount of IPDD/sporc is too hiqh. ve coulU 
engineer an operator site into the '. osp-iphd gene. We 
choose the operator sequence such that a* repressor 
ocnsicivG to a s-all diffusible inducer recognizee the 
oporacar. Alternatively, we could alter the Shine- 
Dalgarno sequence to produce a lo-er homology with 
consensus Shine-Dalgarno sequences. If the ar.ount of 
IPBD/spore is too low, we can introduce variability 
into the pror.otcr or. Shine-Da Uja rno sequences and 
screen colonies for higher anounts of IPBD/spore. 

In this step, we measure elution voiuaes of 
genetically pure CPs that elutc froo the affinity 
matrix as sharp bands that can' be detected by UV 
aor.orption. Alternatively, sa.-:ples fron effluent 
fractions can be plated cn suitable -.ediun (colls or 
spores) or on sensitive cells (phage) and colonies or 
plaques counted. 

Several values of IP3D/GP, DoAi-toM, elution rates, 
initial ionic strengths, and loadings should be 
- exacined, Th*5 following is only one of raany ways in 
which the affinity separation could be optini^ed. We 
anticipate that optimal values of IP30/CP and DoAKoM 
will be correlated and therefore should be optinized 
together. The effects of initial ionic strength, 
elution rate, and amount of GP/{r(atrix volume) are 
unlikely to be strongly correlated, and so they can be 
optimized independently. 



For each set of parameters to be 
cclu.Tin is elutcd in a specified nianner. 



tested, the 
For example. 
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we nay use a regine called Elution Regime 1: a KCl 
gradient runs fron lOmM to naxi'rnun allowed for the 
CP{IPBD) viability in 100 fractions of 0.05 Vy, 
followed hy 20 fractions of 0.05 Vy at maximum allowed 
KCl; pH of the buffer is caintained at the specified 
value with a convenient buffer such as phosphate; Tris, 
or MOPS. Other elution regimes can be used; what is 
important is that the conditions of this optimization 
be similar to the conditions Chr.t are used in Part III 
for selection for binding to target (Sec. 15.3) and 
recovery of GPs from the chromatographic system (Sec, 
15.4), 



m 



When the os p- iobd gone is regulated by (XI>JDUCc:|, 
15 IPBD/CP can be controlled by varying [XINDuCE]. 
Appropriate values of (XINDUCE] depend on the identity 
of (XINDUCE] and the promoter; if, for example, XINDUCE 
is isopropylthiogalactoside (IPTG) and the promoter is 
lacUVS . then (I PTC] = 0, 0.1 UM, 1.0 uM, 10.0 uM, lOC.O 
20 uH, and 1.0 mM would be appropriate levels to test. 
The range of variation of iXIUDUCE] is extended until 
an optimum is found or a.-i acceptable level of 
expression is obtained. 



25 DoAMoM is varied fron the maximum that the matrix 

material can bind to Ik or 0.1% of this level in 
appropriate steps. We anticip.ite that the efficiency 
of separation will be a smooth function of DoAJ-loM so 
that it is appropriate to cover a wide range of values 

30 for . DoAMoH with a coarse grid and then explore the 
neighborhood of the approx-inate optimum with a finer 
grid . 

Several values of initial ionic strength are 
3 5 tested, such as 1.0 n.M, 5.0 hlM, 10.0 m.M and 20.0 .t-M. 
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Low ionic scrcnrjc^i favors bin-iinq beCvoer. oppositely 
charged groups, but could also cause CP to precipitate. 

The elution rate is varied, by successiv.: factors 
5 of 1/2, from the maximum attainable rate to 1/16 of 
this value. If the lowest elution rate tested gives 
the best separation, we test lower elution rates unwil 
we find an optinii.-n or adequate separation. 



10 The goal of the optimization is to obtain a sharp 

transition between bound and unbound CPs, triggered by 
increasing salt or decreasing pH or a combination of 
both. . This optimization need be performed only: a) for 
each temperature to be used, b) for each pH to be used, 

15 and c) when a new C?(IPDD} is created. 



Sec. 



10. 2 : 



^:':;.^^su r i ng the sensitivity 



affin ity 



sepa ration : 



20 Once the v.^.lues of TP3D/G?, DoA^oM, initial ionic 

strength, elution rate, and antount of C?/(volu-e of 
affinity support) have been optimized, ve determine the 
sensitivity of the affinity separation (Cg^nsi) 
following procedure that measures the ninir.um quantity 

25 of C?{I?OD) that can be detected in the presence of a 
large excess of wtCP. The user chooses a nur.ber of 
separation cycles, denoted Nchrom* ^^^^ will be 
performed before an enrichment is abandoned; 
preferably, f^chrom ^= ^'^^9° 6 to 10 and N^^nrom 

30 must be greater than A, Enrich.'nont can be terminated 
by isolation of a desired GP(SDD) before ^chrom P-asses. 



35 



The neasuroncnt of sensitivity is significantly 
expedited if CP(IPBD) and wtCP carry different 
selectable markers because such markers allow easy 
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identification of colonies obtained by plating 
tractions obtained from the chroraatogr.:\phy column. For 
example, if • utCP carries kanamycin -resistance and 
CP(IPDO) carries ampiciUin resistance, ve can plate 
5 fractions from a column on non-selective media suitable 
for the CP.- Transfer of colonies onto anpicillin- or 
kana(r.ycin-containing media will determine the identity 
of each colony. 

10 Mixtures of GP(IPBD) and vcCP are prepared in the 

ratios of l:Vj^ijn, where ^ ranges by an appropriate 
factor f e.q . 1/10) over an appropriate range, typically 
10^1 through 10**. Large values of "^xi^ are tested 
first; once a positive result is obtained Cor one value 

15 of Vj_ij^, no smaller values of Vj^j^j^ need be tested. 
Each mixture is appJ.ied to a column supporting, ac the 
optimal DoAMo."-, an AfM(IPDD) having high affinity for 
IPBD and the column is eluted by the specified elution 
regime, such as Elution Regime 1. The last fraction 

20 that contains viable CPs and an inoculum of the colum.n 
matrix material are cultured. If GP(IP50) and wtCP 
have different selectable markers, then transfer onto 
selection plates identifies each colony. If GP(.tPBD) 
and wtCP have no selectable markers or the same 

25 selectable markers, then a number f e .q. 32) of CP 
clonal isolates are tested for presence of rP3D by the 
techniques discussed in Sec. 8. If tPBO i.«? not 
aetected on the surface of any of the isolated CPs, 
then CPs are pooled from: a) the last few (e.g. 3 to 5) 

30 fractions that contain viable CPs, and b) an inoculum 
taken from the column matrix. The pooled CPs are 
cultured and passed over the same column and enriched, 
for CP(IPBD) in the manner described. This process is 
repeated until N^hrom Passes have been performed, or 

35 until the IPBD has been detected on the CPs. If 
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C>-(1P30) is not detected alter t^chrom P^^ssec 
decreased and the prccoss is repeated. 



Once a value for found that allows 

recovery of C?CIPBD)s, the factor by which ^Hj^ is 
varied is reduced and additional values are tested 
unti) '"^liu knjv<n to within a factor of two. 



^ser.si equals the highest value of ^lin for which 
the user can recover CP(IPEO) within Nj-^^qj^ passes. 
The nucber of chromatographic cycles (K^yj,) that were 
needed to i.solate CP(IPBD) gives a rough estimate of 
Cgff ; Cgff is approxir.atoly the K^y^th root of vlim: 



Ccff = (approx.) exp( iogg ( Vj^ ^-j) /K^yc ) 

For exaniple, if Vj^^j^ were 4.0 x 10^ and three 
separation cycles were needed to isolate CP(I?BO), then 
^eff ~ (approx.) 736*. 
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.Sec . 10.?: Me.^su ring the f*ffic io ncv o t seo<\ r-i t. i. on : 

To detorT.ine Cg^* more accurately, we dotemine 
the ratio of CPC IF3D} /wtCP loaded onto an AiMflFBD) 
column thcit yields appro*-: imateiy equal amci;nts of 
GP(IPBO) and wtGP after clution. We prepare ir.ixturcs 
of GP(IPBD) and wtCP in ratios C?( IPBD) :wtCP :: 1:Q; we 
start Q at twenty tines tNe approximate C^ff found in 
Sec. 10.2. A l:Q mixture of GF(IPDD) and wtCP is 
applied to a AfMflPBD) column and eluted by the 
specified eluticn regir.e, such as Elution P.egine 1. A 
sample of the last fraction that contains viable CPs is 
plated at a dilution that gives well separated colonics 
or plaCj'jes. The presence of I PSD or the osp- jpbd gene 
in each colony or plaque can be determined by a number 
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cf standard i-cthods, including: a) use of different 
selectable narkers, b) nitrccellulosc filter lift of 
CPs and detection with AfM(IPaD)» {AUSU37) , or c) 
nitrocellulose filter lift of CPs and detection with 
radiolabeled D^/A that is cor.plementa ry to the osp-icbd 
gene fAUSU87) . Let F be the fraction of CP(:P30) 
colonies found in the last fraction containing viable 
CPs. When a Q is found such that .20 < F < ,80. then 



10 



-e£f 



15 



If F < 0.2, then we reduce, q by an appropriate factor 
(e_:_S^. 1/10) and repeat the procedure. If F > 0.8, then 
we increase Q by an appropriate factor ( e.g. 2) and 
It the procedure. 
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Other separation r.eans are optinized in a manner 
10 parallel to the used for affinity chromatography. 

FACS is likoly to be nost appropriate for 
bacterial cells and spores because the sensitivity of 
the machines requires appro>:i.T.ately lOOO molecules of 
fluorescent label bound to each CP to accor.plish a 
separation. An appropriate con.-nercial VACS nochine is 
a FACStar fron Bcckton-Dick inson. Mountain View, CA. 
To optimize FACS separation of CPs. we use a derivative 
of Afm(IPDO)A that is labeled with a fluorescent 
molecule, denoted Afn (IPBDJ*. The variables that must 
be cptinized include: a) a.-nount of IPBD/CP, b) 
concentration c.' Afnt(IPDO)-, c) ionic strength, d) 
concentration of CPs, and e) paran^cters pertaining to 
operation of the FACS nachine. Because Afn(rPBO}* and 
CPs interact in solution, the bindi.ng will be linear in 
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both I Af»-n( I P3D) • i and (displayed rPQU] . Preferably, 
theso two parameters are varied togatt^cr. The other 
parameters can be opti:aizcd independently. The 
sensitivity and efficiency of the FAC3 separation are 
5 determined in a manner parallel to • those used for 
chromatography . 

Elect rophoresic is nest appropriate to 
bacter iophaqe because of their snail sice. Server 

10 (SERW87} has reviewed use of agarcse-gel 
electrophoresis to separate phage based on charge. 
Electrophoresis is a preferred sftparaticn neans if the 
target is so small that chemically attaching it to a 
colur.n or to a fluorescent label would essentially 

15 change the entire target. For example, chloroacetate 
ions contain only .seven atoms and would be essentially 
altered by any linkage. CPs that bind chlcrcaceta te 
would become more negatively charged than CPs that, do 
not bind the ion and. so these classes of CPs could be 

20 separated. 

The "parameters to optimize for electrophoresis 
- include: a) IPBD/CP, b) concentration of gel material, 
e.g. agarose, c) concentration of Afm (l?QD), d) ionic 
25 strength, e) size, shape, and cooling capacity of the 
electrophoresis apparatus, f) voltages and currents, 
and f) concentration of CPs. Preferably, IPBD/GP a.-^.d 
(Afm{IP3D)3 are varied at the same time and other 
parameters are optimized independently - 

30 

In Part II wc have detcrni i- .-vi optimal conditicns 
for separating CPs based on proteins displayed on the. 
CP surface. \<c have also determined the capabilities 
of the affinity separation system. Knowledge of these 
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copabilicies allows us co chooac appropriace levels of 
varieqation in Pare III. 

Piirt III 

See. U-O: Choice of tarnct material : 

Any .naceri'il r.Ay be ctxcznn as tarqec material, 
subjecc only zc cr.o followinT restrictions: 

If affinity chro.T.a cography is to be used, then: 

1) the molecules of the target mattjrial nust be of 
sufficient size and chcriical reactivity to be 
applied to a solid support suitable for affinity 
separation, 

2) After application to a natrix, the target 
r.aterial r.ust' not react with vater, 

3) after application to a matrix, the target 
natcrial must not bind or degrade proteins in a 
non-.speci f ic -ay, and 

4) the molecules of the target material must be 
sufficiently large that attacning the material to 
a matrix allows enough unaltered surface area 
(gonorally at least OOO , excluding the atom 
that is connected to the linker) for protein 
binding. 

If FACS to be :j:;cd as the affinity separation 
means, then: 
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1) the coleculos cf the ^arcjot :nat-ri.U niust tc of 
sufficient size and che.iicrii re;tctivity to be 
conjuc,*dted tc a suitJbXc fluorescent dye or the 
target nust itr.nif be fluorescent, 

2) after jny nocessary fluorescent labeling, c.^.e 
tarqec cust net react vich wo tor, 

3) after any necessary tluor^scent labeling, -he 
target .-nateriai nust not bind or degrade proteinf. 
in a non-specific way, dnd 

4) the nolecules of the target r.aterial raust be 
sufficiently largo that attaching the n.^r.erial to 
a suitable dye allows enough unaltered surface 
area (generally at least 500 excluding the 
at03 that is connected to the linker) for Frotcir. 
binding. 

If affinity electrophoresis is to be u.^ed, then: 

1) the target -ust eithor fcc charged or of such a 
nature that its bindimj to a protein will change 
the charge of the protein, 

2) the target r^acerial must net react with water. 

3) the t.-^rgot naterial r.-jst net bind or de-rrrade 
proteins in a non-specific way, and 

^) tho target r.ust bo cor.patibie with a suitable 
gol natorial . 



Possible target materia i:; include, but are not 
3 5 linited to: 
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^, ^orse heart r.yoglct.ir. 
„ Cholesterol^ y,,cn.^ .r^^^^l^ 
4, yeast phcnyUlanyl tJ^^^ 

5) asticstos 

6) alpha-fctoproccin 

7) ran proteins 
3j low density lipcprctc.n 
q) prostaqlandln FCE2 

.vnhA interferon 

11) melLttin 

12) BO£dete.l_Ia U^X^l^ 

13) afUtoxin Bj 

14) aspartar.e 

15) haem 

16) bilirubin 

17) r.orphine 

IS) codeine _ ,<,,icreth.r.e (OOT) 

19) dichlorodir-'ic"' - 
:0) benzo(a)pyrene 

21) actinomycin D 

22) any retroviral Ei- ? 

22) any protease 

23) any retroviral oi3 f 

2.) fibril or species. e_ 

several spirochaete --"-^ 
oc.anisns ca.oinq syph^-- '-V- 

Z\ - coU cnter.toxin protein 

31) zeolites 

32) cellulose 



llo 



33) hyclroxylapat iCG 

^A) OtlA of a. defined sequence 

35) fibrin 

36) turror necrosis factor 

37) specific monoclonal antibodies 



A 5iijpply of several nilliqrams of pure tarqct 
material is desired. Ir.pure target .-natorial could bo 
u::.ed, but one might obtain a protein that binds to a 
10 contaminant instead of to the target. 



The following information about. the target 
material is highly desirable: 

15 1) stability as a function of temperature, pM, and 

ionic strength, 

2) stability with respect to chaotropes such an 
urea or guanidin'.un Cl , 

20 

3) pr. 



4) molecular weight, 

25 5) requirements for prosthetic groups cr ions, 

such as haen or Ca^^ , and 

6) proteolytic activity, if any. 

30 

In addition to this mci;t desirable infomation, it 
is useful to y.nov: i) the target's :;equeMce, if the 
target is a macromolccule , 2} the 3D structure of the 
target, 3) enzymatic activity, if any, and 4) toxicity, 
35 if any. 
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The user of the present invention Gpecitios 
certain paraneters of the intended use of the binding 
protein: 

1) the acceptable ter.pcrature range, 

2) the acceptable pH range, 

3) the acceptable concentrations of ions and 
neutral solutes, 

4) . the najcimun acceptable dissociation constant 
for the target and the S30: 

K-p = (TargotUSBO]/(Target:SB0j 

In soze cases, the Ui*er nay require discrimination 
bcf-een T, the target, and tt , sonie non- target. Let 

Kj - 'Ti (SSDJ/lTiCBD] , and 
Kj, = (S3D)/(tI:SBD] , 

then Kt./K,i = ((T)[»:SBD])/([:i){T:SBD]) . 

The user then specifics a naxinun acceptable value for 
the ratio K-^/Kji- 

The target r.aterial r.ust be stable under the 
specified conditions of pH, tonperature, and solution 
condit ion'j . 

If the target material is a protease, one raust 
consider the following points: 
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i; a hicjtily specific proccisc can bo treated like 
any othur target , 

2) a Ci'Gnorai proteose, such as subtilisin, nay 
degracic the OSPs of the C? Including 0SP-?8Ds; 
there* are several alternative vays of dealing with 
gcneriil protcacos, incluoir.g: a J a ch'j-.icai 
inhibitor r.ay be used to prevent prcc.-olysis (r:._^^ 
phenyi-.ethyi f luorcsul f tjtc (PMPS) th'ti inhibits 
serine proteases) , b) one or tnoro active-site 
residues nay be mutated to create an inactive 
protein f e . a . a serine protease in which the 
active serine is r.utated to alanine) , or c) one or 
more active-site a.-ino-acids of the protein nay be 
chenically ncdified ro destroy the catalytic 
activity f e . n . a r.crine protease in vnich the 
active serine is converted to anhy-Jrccerine) , 



20 



3) SDDs selected for binding to .a protoase need 
not be inhibitors; SGDs that happen to inhibit 
the protease target arc <i fairly s.-na 1 1 subset of 
SBDs that bind to the protease target. 



25 



4) the r.cre>'G r.cdify the target protease, the 
less li^:e we are to obtain an S30 that inhibits 
the target proteose, and 



5) if the user requires that the SHO inhibit the 
target protease, then the active cite of the 
30 target protease must not be nodificl any trore than 

necessary; inactivation by nutation or chonical 
modification are preferred r.ethods ot inactivation 
and a protein protease inhibitor tccor.es a prir.o 
candidate for I PRO. Tor e>:ar.ple, ni-Tl could be 
nutatt-d, by thy methods of the present invention. 
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^o bind to protcascj ochor than trypsin (TANK7'' 
and TGCH87J . 

See. 12.0: Choice of : 

5 

The user must piw^: j CPCIPBD) that is suiCdble to 
th** chosen target oc-ording to the criteria of Sec. 2. 
It is anticipated t.^.ic a small collection of a 
GP{IPDO)s can be ap*:o-i:lcd sucii that, for any- chosen 
10 tarqct, at least one r.e-t:er of the collection '^ill be a 
suitable starting point for engineering a protein that 
binds to the chosen tarqet by the nethcis of the 
present invention. 

15 If the pH, temperature, or other parameters of t.^:e 

intended use of the selected SDD differ r.arkedly iron' 
the conditions used'to cpti.-nize tt;e affinity scp.^ration 
for the chosen CP(IPGC), then the user should cptiraize 
the affinity separation for conditions appropriate to 

20 the intended use by the r.ethods described in Part II. 

Sec. 13.0: Irient i f ic i on of Fanilv of P3t"s. Related 
to PPDD. ■ to 3e Gt;norMtec^. 

2 5 Sec. 13.1: Cl>.o."!sim roF i-.-!'.:os on IPBD for oc!'.er 
to vary I 



We chocue residues in the IP3D to vary through 
consideration of several factors, including: a) the 3D 
structure of the IPOD, bj sequences homologous tc I?3D, 
and c) nodding of the I PBO and mutants of the IP3D. 
Because the nu-bcr o: residues that could strongly 
influence binding is alvjys greater than t.^e nurber 
that can be varied s icu 1 tancous ly , the user must pick a 
liubset of those residues to vary at one tine. The user 
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r^irv: tria' invcls ot variegation and 
■ nusC also pJ-Ck cru- 

J ^^c- rtf various sequences. me 
calculate the abundances of vario 4 

. . ^r.M rhf> level of variegation at 
list of varied residues and the leyei □ 

j^^f. adiusted until the composite 
each varied residue are ad^u^^ti 

variegation is co.nensurate wi.h C3ensi. «ntv 

we nov consider the principles that guide our 
choice-o. residues of the 11-00 to vary. A Key, concept 
that only structured proteins exhiMt specitic 
. u;nd to particular chemical entity 

bindina, i ■ e ._ can bind to P^-^ 

binainq, ^ Thus the residues to 

to the exclusion cf r.ost other.- Thus tn 

--K -»n eve to preserving the 
be varied are chosen with an eye. lo p 

• T-'RD .c^ucture. Substitutions that prevent 
underlying IrBO st.uc.ure carrying those 

the PBD frca folding •-"lU cause Gfs c y 4 
TeLs CO M.. indis=.L=inocolv so c.a. t^ey can casw,- 
be reit-oved from the pcpul-ici on. 

Burial oe hydrophobic surfaces so .hoc bulk -ater 
is «clu.e. is one of s.ror.^es. forces .r.v.ng c e 

.indin, of pro.cins .c o.r.cr noleculcs. Dul. -.^er c 
be excluded fro. the re-on between two molecules o..-, 
.he surfaces are corplcentary - "...aust test a. 
„;„y surfaces as po.si.lc to find one that .s 
■ rhc 'o-c-.-t. The select ion-throur;n- 

coaplenentary to nho .a.c;-- „^a-iv 

5 binding isolates tho.e proteins that are .ore nea y 
c=3ple.entary to so.e surface on the "^^"0. s 
effective diversity of . variegated P=P"^--; 
measured by the nun.or of different sue .cos . .. 
than the nu.ber of pr=te>n sequences. ^ t 

0 aaxi^i.o the nu.ber =f surfaces generated .no - 
population. rather th.n the rubber of P-ote.n 
sequences. 

„v...-r.if> L ue consider ^ 
In hypothetical e\a..ple i, 

;« wir-urG 4 binding to a 
5 hypothetical PDD, shovn in Hgore 
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hypothetical tar»;et. Figure 4 is a 20 scheir.atic of 3D 
objects; by hypothesis, residues 1, 2, 4. 6, 7, 13, 1-:, 
X5, 20, 21, 22 , 27, 29, 31, 33 , 34, 36, 37, 36, and 39 
of the IPBD arc on the 3D surface of the IPBO, even 
b though shovn well inside the circle. Proteins do r.ot 
have distinct, countable faces. Therefore we define an 
"interaction sot" to be a set oC residues such tnat ail 
ne.T.bers of the set can simultaneously touch one 
Dolecule of the target oatorial *-rithout any atora or" the 
10. target coning closer than van der Waals distance to any 
main-chain aton of the rP3D, The concept of a residue 
"touching" a r.oLeculc cf the target is discussed belov. 
One hypothetical interaction set. Set A, in Figure a 
conprises residues 6, 7, 20, CI, 22, 33, and ZA , 
15 represented by squares. Another hypothetical 

interaction set. Set B, co.-:iprises residues 1, 2, 4, 6, 
31, 37, and 35, represented by circles. 

If we vary one residue, nur.ter 21 for e>:ar,ple, 
2C through all z'*-f:nzy ar.ino acids, ve obtain 2 0 pre t,- in 
sequences and jO differe.".t surfaces for interaction set 
A. Uoze. that residue 6 is in two interaction s-sts and 
variation cf residue 6 through all 20 ftmino acids 
yields 20 versicns of interaction set A and 20 versions 
25 of interaction set D. 

Now consiiior varying tvo residues, each through 
all twenty ar.ino acids, generating sOO prctcin 
sequences . I f the tvo residues varied vere. for 

30 exar.ple, nur.ber 1 and nur.ber 21, then there would be 
only 40 different surfaces because interaction set A 
does not deper.d on residue 1 and interact icri set D docs 
not depend on residue 21. If thd two residues varied, 
however, were nur.bcr 7 and nu-.ber 21, t^?n 400 srr faces 

3 5 would be generated. 



15 



If N spacialLy soparaccd residues arc varied at 
one tine, 20 x U surfaces arc qenciated. Variation of ^• 
K residues in the cane interaction set yields 20'* 
surf^tces. For cxarr.ple, if ti - 7, variation of 
separated residues yield:; l-JO surfaces while variation 



r 

of interactinq rcfiid-jcc yields 20' - 6.-; x IG^ f.. 

f - 

sur races. Thus, to -axinizo tne number nt surfaces -^'^ 
generated when ii residues are varied, all residues t-..- 
10 should be in the sar.e interaction set because variation 

of several residues in ere interaction set generates an u. 
exponential nur.ber of surfaces while variation of 
spatiall'/ separated surface residues generates only a ['". 
linear nu.-ber. V" 



The amount of surface area buried in strong v 
protein-protein interactions ranges from 1000 \^ to f-'^'-.tr 
2 000 , AS sur-.aritcd by Schulz and Schirr.er (GCHU7 9, ^>^ti- 



pl03ff). Individual a.T.ino acids have total sui race r-^-'y* 

2 0 areas that depend r.os"Ly cn type of amino acid and 

•-•ea>:ly on con f orr.a t ion . These areas range fron about ^ 

ISO for glycine to about .3G0 for tryptophan. f 

Averages, of total surface area by amino acid type and t*-. 

naximun exposed surface area of each amino acid type I ' 

25 ^fur two typical proteins, hon egg vhite lysotyr.c (HEWL) **.• . -" 

and TA lysoivne (T-JL), are shown in Table 6. frcr. [ 

these exposures, one car. calculate that ICOO A- cn a \< ' ■ 

V 

protGi.n surface cr.::prisc*.i between 4 and 30 ar.ino acids, 

depending on the amino a-Jid types and the protein 3D J;. 

30 structure. Varied A.-ino ocid sequences, as found in I 

r ■ • 

actual proteins, involve between 10 and 25 residues in \' :. 

t. ' 

:or.Tn.ng lOQO A- of pre "9 in surface. Schult and f, 

Gchir.-acr cstir,ate that ICO A- of protein surface can j--;- ^ 
exhibit as nany as ;.~jOO different specific patterns \~ 

35 (SCIiU79, pL05) . The nu.T.ber of surface patterns rises i r-'. 



exponentially with the area th.-»t c^n be varied 
independently. One of the BPTI structures recorded in 
the Drookiiaven Protein Data Bank {6PTI), for example, 
has a total exposed surface area of 3997 (using the 
method of Lee and Richards (LEEB?!) and a solvent 
radius of -1.4 A and atonic radii as shoun in Table 7), 
If we could vary this surface freely and if 100 can 
produce 1000 patterns, wo could' construct 10^^^ 
different patterns by varying the surface of BPIT : 
This calculation is intended only to suggest the huge 

p rote in backbone . 

One protein frane'-crk cannot, however, display all 
possible patterns over any one particular 100 A^ of 
surface merely by rep 1 a cement, of the side groups of 
surface rcsid'jcs. The protein bnckbone holds the 
varied side groups in approximately constant locations 
so that the variations are net independent. We can, 
nevertheless, generate a vast collection of different 
protein surfaces by varying those protein residues that 
face the outside of the protein. 

Figure 5 shows 3?TI in contact with nyoglobin. 
-From this we can sec that residues 3, 7, 6, 10, 13, 39, 
41, and 42 can all simultaneously contact a molecule 
the size and shope of -yoglobin. Figure 5 also shows 
that residue 49 can not touch a single myoglobin 
molecule sir.ul tancously with any of the first set even 
though all are on the surface of BPTX. It is not the 
intent of the present invention, however, to use -odels 
to determine which part of the target molecule will 
actually be the site of binding by PBD. 
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If cassette nutagGnGS Ls is picked^ the protein 
residues to be varied are, preferably, close enough 
together in sequence that the variegated DMA (vgONA) 
encoding all of them can be nade in one piece. The 
present invention is not limited to a particular length 
of vgDNA that can bo synthesized- With current 
technology, a stretch of 60 amino acids (180 DNA bases) 
can be spanned . 

Further, when there is reason to mutate residues 
further than sixty residues apart, one can use other 
mutational means, such as single-stranded- 
oligonucleotide-di re-.ted mutagenesis (DOTS85) using tvo 
or more mutating primers. 

Alternatively, to. vary residues separated by mere 
than 'sixty residues, tvo cassettes may be mutated as 
follows: 

1) vg D:.^^ having a i ov level of variegation {toT 
example, 20 to 400 fold variegation) is introduced 
into one cossette in the OC/, 

2) cells are transfomed and cultured, 

3) vg OCV crJA is obtained, 

4) a second segment of vgD.'.'A is inserted into a 
second cassette in the OCV, and 

5) cells are transformed and cultured, CPs are 
harvested and subjected to selection-through- 
binding. 
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The composite level of variation nusc not exceed the 
prevailing capabilities to a) produce very large 
nuabors of independently transforned cells or b) detect 
snail components in a highly varied population. The 
limits on the level of variegation are discussed in 
Sec, 13.2. 

Here -e ascc-blo the data about the IPED and the 
target that arc woeful in deciding which residues to 
vary in the variegation cycle: 

1) 3D structure, or at least a list of residues on 
the surface of the IPBD, 

2) list of sequences hor.oloqous to IPDD, and 

3) DOdel cf the target nolccule or a stand-in for 
the target. 



These data and an understanding of the behavior of 
different amino acids in proteins will be used to 
answer two questions: 

1) which residues of tl;e IP5C' are on the outside 
and close enough together in space to touch the 
target sinul tanedisly? 

2) which residues of the IPDD can be varied with 
high probability of retaining the underlying IPEjO 
structure? 

Although an atonic r.cdel of the target r-aterial 
(obtained through X-ray cryota 1 Icgraphy . NMR, or other 
neans) is preferred in such exonination, it is not 



necessary. For ew:i.T.ple, if the- target were .3 protein 
of unknown 3D structure, it vould be sufficient to know 
the molecular woicht of me protein and -whether it were 
a soluble globular protein, a fibrous protein, or a 
5 me=;brane protein. Physical rieasurenionts . such as low- 
angle neutron diffraction, can detemino. the overall 
molecular shape, viz. tr.o ratios of the principal 
ncnonts of inertia. One can then choose a protein ot 
known structure of the sa-e class and similar size and 

10 shape to use as a r.olecular stand-in and yardstick. It 
is not essential to ncasure the noments of inertia of 
the target bcca-- at low resolution, all proteins of 
a given size a.-.i class look much the same. The 
specific volumes are t.^:o sar.c, all are rr.ore or less 

15 spherical and therefore ail proteins of the same size 
and class have about the sar.e radius of cur^/ature. The 
radii of curvature of tho t'-o molecules deterr.ine how 
much of the ■ two r.olccules can come into contact. 

20 Several graphical and computational tools that are 

needed or useful. The r.cst appropriate method of 
picking the residues of the protein chain at vhich the 
amino acids should be varied is by viewing, uith 
interactive ccrputer graphics, a model of the I?5D. A 

25 stick-figure representation of molecules i.s preferred. 
A suitable set of hardware is an Evans & Sutherland 
PS 3 9 0 graph -.cs turmi.'^al (Evans Sutherland 
Corporation, Salt Lake City, UT) and a HxcroVAX II 
supermicro corputer (Digi-cal Equipment Corp., .^-.aynard, 
• 30 y^O . The computer should, preferably, have at least 
150 megabytes of disk storage, so that tho Drockhaven 
Protein Oata Dank can c-2 kept on line. A FC.'.TP^N 
compiler, or sc.v.c equally good higher- level language 
processor is preferred for prograr development. 

35 Suitable programs for vicving and mani';ulat ing protein 



moJcls include: a) P5-r=lCC0. vritcen by T. A. Jones 
(JONE85) and distributcJ by the Siocheroistry Department 
of Rice University, Houston. TX; and b) PROTEUS, 
developed by Dayringer. Tramantano, and FlettericJc 
5 (DAYR36). Ir.portant features of PS-FRODO and PROTEUS 
that are needed to viov and manipulate protein- models 
for the purposes of the present invention are the 
abilities to: 1) display -olecular stick figures of 
proteins and other r.oLoculus. 2) zoom and clip i-aqes 

iO in real cine:, jj prepare various abstract 
representations of the .-olccules, such as a line 
joining Caipha^ and sicio gruup atons, 4) compute and 
display solvent-accessible surfaces reasonably quiclcly. 
5) point to and identify atOT.j, and 6) neasure distance 

15 between atoris. 

In addition, one could use theoretical 
calculations, such as dynar.ic simulations of proteins, 
to estimate whether a substitution at a particular 
10 residue of a particular ar.ino-acid type might produce a 
protein of approximately the sane 3D structure as the 
parent protein- Such calculations night also indicate 
whether a- particular substitution '-fill greatly affect 
,the flexibility of the protein; calculations of this 
25 sort nay be useful but arc not required. 

sec. \ 3.1.1: The p rincipal set: . [ 

i: 

In this section ve pick a principal set of 
30 residues of the imiD to vary. Using the knowledge of . f.. 
which residues are on the surface of the IPliD {as noted 
above), we pick residues that arc close enough together 
on the surface of the IPao to touch a molecule of the 
target simultaneously without having any IPDD rain- 
3 5 chain atom corie closer- than van dcr Waals distance f " /.j/ 
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( v i: . *;.0 to 5.0 A) fron any target atom. For the 
purposes of tMc present invention, a residue of the 
IPDD "couches" the target if: a) a roain-chain atom is 
within von Ucr Waals distance, viz. 4.0 to 0.0 A of any 
aton of the target nolccule, or b) the C^j^^^ is vithin 
^cutoff °- aton ot the target molecule so chat a 

side-group atom could naKo contact with that- atom. 
Because side grojps differ in sir.e f c f . Table 3 5), some 
judgnont is required in picking D^-^t^off* ''"^ 
preferred cr.bod i n-.ent , vc vill use D^,j^Qff = 8.0 A, but 
other values in the range 6.0 A to 10.0 A could be 
used. If tPBD has C at a residue, we construct a 
pscudc Ci3^;jt^ with the correct bond distance and angles 
and judge the ability of the residue to touch the 
target froa this psoudo Ci^^^ta- 



Alternatively, we choose a set of residues cn the 
surface of t^le IP3D such t.hat the curvature of the 
surface ricfiroj by- t^le residues in the set is not v.o 

ZO great t.'-.at it would prevent contact between all 
residues in the set and a nolecule of the target. This 
method is appropriate if t!ic target is a macronolccule, 
such as a prctcin, bccaur.e the PBCs derived from the 
IPDO will contact only a pari of the nacrop.c Iccular 

25 ^surface. The surfaces cf nacronolecu les are irregular 
with varying cur^.'a tu res . If wc picK residues that 
define a surface tliat is not too convex, then there 
will bo a rf;7ion on a nacromo I ecul e r target with a 
cor.patibie cur^/ature. 

30 

In addition to the goor.ctrical criteria, we prefer 
that there bo sc.r.o indic/jtion that the underlying IPBD 
structure will tolcrote substitutions at each residue 
in the principal set of residues. Indications could 
35 come from various sources, including: a) hor.ologous 
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35 different type uithDut cubstanti.^l risk thac the 



sequences, b) static co.-.puter rpodeling, or c) dynamic [ 
conputer simulations. 



The residues in the principal set need not be ^ 
contiguous in the protein sequence. The exposed ^ 
surfaces of the residues to be varied do not need to be ^ 



i 



connected. We require only that the amino acids in the ^. 



y ■ ' r 

residues to be varied all be capable of touching a k'^ 
molecule of the target rater ial simultaneously ui^nout ^' . u 

having atoms overlap. If the target were, for example, | 



Uoc'Ja heart myogiobin, and it the IPBD were 3PTI, any 
set of residues in one interaction set of BPTI defined 

in Table 3 4 could be picked. t*":'"" 



15 Preferably, the principal set contains eight tc 

sixteen residues. This number of residues allows 
sufficient variability that a- surface that is 
conrplemcntary to the target can be found, but is small 
enough that a 5 igni t ic'^nt fraction of the surface can p-'.' 

20 be varied at one time. 



^ - ■ 

r : ._ 

m 



I..:.:. 



Scc. 13.1.? : Tr. i zr.rzc r.'i f r~/ set: 

The secondary sot cc.-p rises those residues not in _ f- 
25 'the primary set that tc^ch residues in the primary set. 
These residues rsioht fc': excluded fron the primary set 
because: a) the residue is internal, b) the residue is 

highly conserved, or c) t^e residue is on the surface, {. • 

but the curvature of the IPDD surface prevents the ^ ^ ^ 

30 residue from being in contact with the target at the ^- 

sa.me time as one or mere residues in the primary set. ^ \ -. 

r 

Internal residues are frequently conserved and the t . " 

ammo acid type c-^n not be changed to a significantly r.--. 
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protein Gtruccure will be disrupted. t/everthclcss, 
some conservative changes of internal residues, jiucn as 
I to L or F to y , a re tolerated . Such consGr-zat ive 
changes affect the detail placement and dynamics of 
adjacent protein residues and such variation nay bo 
useful once an S30 is found. 

Surface residues in the secondary set arc -ost 
often located on the periphery of the princip/ii set. 
Such peripheral residues can not make direct cor.tact 
with the target simultaneously with all the ether 
residues of the principal set. The charge on the aaino 
acid in one of these residues could, however, have a 
strong effect on binding. Cnce an S6D is found, is 
appropriate to vary the charge of sone or all of these 
residues. For example, the variegated codon containing 
equir.olar A and G at base 1, equimolar C and A at base 
2, and A at base 3 yields anino acids T, A, K, and E 
with equal probability. 
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Chnice of residues to varv initiall y: 



Choice of residues in the prir.ary and senc.ndary 
set is based cn: a) gcc:?.Gtry of the IPSO and the 

25 -geonetrical relationship between the IPSO and the 
target (cr a stand-in for the target) in a hypothetical 
complex^ and b) sequences of proteins hor.oicgous to the 
r?BO. In this section we pick a subset of the rociducs 
in the primary and secon-lary sets, based on goo-ctry 

30 and on the naxir-u.-Ti allowed level of variegation that 
assures p rog ress iv i ty . The allowed lovoi of 

variegation determines how nany residues can be varied 
at once; gconetcy dotcrnihts vhich ones. . 
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The user nay pick residues to vary in many '<:,^yr.; 
the following is a preferred mw-\nncr. Pairs of residues 
are picked that are diarr.etrically opposed across :he 
face of the principal set. Two such pairs are used to 
5 delimit the surface, up/down and right/left. 
Alternatively, three residues that form, an inscribed 
triangle, having as large un arc.i as possible, on the 
surface are picked. One to three other residues are 
picked in a checkerboard fashion across the interaction- 
10 surface. Choice of widely spaced residues to vary 
creates the possibility for high specificity because 
all the intervening residues must have acceptable 
coaplenentar ity before favorable interactions can occur 
at widely-separated residues. 

15 

The number of residues picked is coupled to the 
range throug.^ which each can be varied by the 
restrictions discussed in Sec. 13.2. In the first 
round, we do net assure any binding between IPBD and 

20 the target and so progressively is not on issue.. At 
the first round, the user nay elect to prod-jcc a level 
of variegation such that each r.olecule of vgDNA is 
potentially different through, for example, unlinirod 
variegation of 10 codons (20^*^ approx. « 10^^). One 

25 xun of the DN*A synthesizer produces approximately 10^^ 
molecules of length 100 nts. Inefficiencies in 
ligation and transformation will reduce the number of 
proteins actually tested to between lo'^ and 5 x IC^. 
Multiple replications of the process with such very 

30 high levels of variegation will not yield rcpeatable 
results; the user nust decide whether this is 
important . 

Sec. 1 3.2: P.anno of v.Triarion at E.trrh Site cf 

3C Mutat ion : 



Having picked which residues to vary, -e must now 
decide the range cc amino acids to allow at each 
variable residue. The total level of variegation is 
the product of the number of variants at each varied 
residue. Each varied residue can have a different 
schene of variegation, producing 2 to 20 different 
possibilities. We require that the process be 
progressive, i.e. each variegution cycle produces a 
better starting point for the next variegation cycle 
than the previous cycle produced. 

U.S.: Setting the level of variegation such 
that the oobd and many sequences related to 
the cobd sequence are present in detectable 
anounts insures that the process is 
progressive. If the level of variegation is 
so hiqh that the poi:d serr^encc is present at 
such low levels that there is an appreciable 
chance that no trans fonn^nt vill display the 
FP'dD, then the best SSC of the r.exz round 
CPU Id be worse than the P?20. At '>.\cessively 
high level of variegation, each round of 
cutagenesis is independent of previous rounds 
and there is r.o assurance of progress iv ity. 
This approach can lead to valuable binding 
proteins, but repetition of experi-onts vich 
this level of variegation will not yield 
progressive results. Excessive variation is 
net preferred. 

Hypothetical exar.ple 2 considers the effects of 
the level of variegation on the progress ivity of the 
process of the present invention. Figure 6 is a 
schematic vicv of a hypothetical e ight- res Idue binding 



surface of a POD cor.prising residues 11, 24, 25, jO, 
34, 12, 44, and 47 of a hypothetical protein. Each 
polygon represents the exposed portion of one residue. 
By hypothesis, there exists at least one protein, shown 
in Figure 6e. having a specific anino acid in each of 
the eight residues that will bind to the target, but ve 
do not, at first. Know what that sequence is. 

The I?BD, shown in Figure 63, nay have none of the 
optimal ar.ino acids on its surface. Because we begin 
with no information, our initial estimate is that all 
:ir.ir.c icids have equal likelihood of beir.g the best dt 
eac^l o: the eight residues. 

Dy hypothesis, the genetic engineering system of 
hypothetical example 2 has ^ntv ^ 
selection-through-binding system has C^^nsi ~ 
Also by hypothesis, the variegation method can produce 
all anir.o acids at a given residue with equal 
probabil ity . 

In the first variegation, we vary residues 11, 24, 
25, 34, and 4 4 through all twenty aaino acids, 
producing 20^ = 3.2 x 10^ sequences. The capabilities 
of the g'=inetic engineering system allows all these 
sequences to be present in the selection step and che 
selection system can detect 1 G? in 10^. By 
Hypothesis, wc isolate a CP carr^'ing an sbd gene that 
encodes the first SCO, shown in Figure 6b, that has 
improved binding for the target and has the amino acid 
sequence Wl l-F 24 -E25-G30-03 4 -E; 2 -P4 4 -T4 7 . This amino 
acid sequence becomes the parental sequence to the next 
variegation. After the first variegation and 
selection, the evidence favors wn, F24, E25. 034 , and 
P44 as optimal amino acids at their respective 
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residues. That residues 30, 
has two indications: 



<2 . and 4 7 uere not vorio'.l 



1) vc still hive no information about which amino 
acid is optimal at these residues, and 

2) the amino acids selected at the varied residues 
are optimal, qivcn the identities of the amino 
acids in the non-varied residues; --hen residues 
30, ^2, and <;7 are varied, our estimate of the 
optimal amino acids in other residues may change. 



Kou consider tvo versions of a variegation that 
take the first intermediate SOD as parent and that 
15 might get us closer to the optimal SBD. 

In the first version of the second variegation, ve 
vary only five residues, producing 3.2 x iO" sequences, 
all of which are expressed and subjected to selecticn- 

2 0 through-binding. We vary residues 30, 42, and s7 
because they were not varied previously. We also vary 
two other residues so that as many surfaces as possible 
arc tested; residues 21 and 44 arc chosen. Suppose 
that we isolate a C? thnt carries an sbd gene encodi.-.g 

25 the araino acid" sequence W11-L2 4-E25-I30-034-R42-P44- 
''K47, shown in Figure 6c. Consider the reason thar D is 
retained at residue 34, We know that all the sequences 
W11-L24-H25-I30-X34-R42-P44-K47 (where X runs through 
all twenty amino acids) were tested and therefore can 

30 conclude with improved confidence that D34 is optir.al, 
given the rest of the selected sequence. Nov consider 
the change at residue 2 4 from F to L. We know that all 
the sequences W l l-x2 4-E25-1 30-034 -R4 2 -P-; 4 -K4 7 were 
tested and we can conclude that L24 is optimal, given 

35 the rest of the sequence.. At each of the varied 
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residues, .we gain information about which anino acids 
are optinal at each varied residue under the conditions 
imposed. 

5 In the second version, we will vary res idues 11, 

24, 30, 34 , 42, and 47, each through all tventy amino 
acids, producing 20*^ =» 6.4 x lo"' possible different 
sequences. Our hypothesis is that only 1.0 x lo' of 
these sequences are produced and subjected to 

10 selection. Because only 15. 6t of the programmed 
sequences are actually subjected to selection, it is 
likely that the parental sequence, Wl 1-F24 -E2S-C30-D34- 
E42-P44-T47, is not present in the selection step and 
there is, consequently, no assurance that the best S30 

15 binds more tightly to target than did the parental PBD. 
Suppose that we isolate a CP that carries an sbd gene 
.encoding the amino acid sequence Vil-R24-t:A5-Q30-D34- 
R42-P44-D47, shown in Figure 6d . Consider the reason 
that D is retained at residue 34. Is it that D is 

20 Optimal, or is it that, by chance, the sequence 
encoding the optinal amino acid, x, was not present as 
V11-R24-E25-Q30-X34-R4 2-P44-D47 in the sar.ple? We do 
not know and therefore can not conclude that D34 is 
optimal. Furthermore, retaining an amino acid can not 

25 ^ move us toward t!i*» optinal sequence. Now consider the 
change at residue 24 from F to R. Was VI l-R24-c:25-Q30- 
D34-R42-P44-D47 selected because R24 is optir.al in the 
presence of Vll- -E25-Q30-D34-R42-P44-D47, or was Vll- 
R24-E25-Q30-034-R42-P44-D47 selected because V11-F24- 

30 E25-Q30-D34-R42-P44 -047 was not present to be selected? 
Again, wc do not know and can not conclude that R24 is 
an improven:ent, i.e. we can not conclude that R24 is 
more likely to be optimal than is F24. In both cases, 
wc lose information about which amino acids belong at 

35 each residue. We may have obtained an SDD with 
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superior bindinc; to the target. Anotner varieqation 
cycle at this level of varieyation, however. Day 
produce a better protein or a vorse protein and the 
process is not progressive. 

5 

Let us contrast versions 1 and 2 of the second 
variegation. In version 1, wc retained, nore 

information, viz . that Wll allo-s inproved binding, and 
therefore cur selection of K47 incorporates the 
10 infomation obtained in the previous rounds. In 
version 2 of the second variegation, we discarded the 
information that WH allows stronger binding than Yll. 

Prcgressivity is not an all-or-nothing property. 

15 So long as nost of the inforcation obt.ained tro:y 
previous variegat.ion cycles is retained and r.any 
different surfaces that are related to the PPBO surface 
are produced, the process is progressive. If the lovel 
of variegation is so high that the ccbd gene r.ay not be 

20 detected, the assurance of prcgressivity dini.Tishes. 
If the probability of recovering ??BD is negligible, 
then the probability of progressive bchavicr is also 
negligible. 

25 An exposing force in our design considerations is 

that PBOs are useful in the population only up to the 
artcunt that can be detected; any ^ excess above the 
detectable ar.'ount is wasted. Thus we produce as many 
surfaces related to PPBD as possible within the 

30 constraint that the PP3D be detectable. 

v/e defer specification of exactly how much 
variegation is allowed until we have: a) specified real 
nt distributions for a vtiriegated codcn, and b) 



exaraincd the effects of d iscropanc ies between cp^citioc 
nt distribucionii and actual nt distributions. 

Sec. 13.3: De?i.on of vnPNA Encoding PDD Fanilv: 

We nust now decide hov to* distribute the 
variegation within the codons for the residues to be 
varied. These decisions are influenced by the nature 
of the qenetic code. V>;hcn vgDNA is -jynt.^.cs i zed , 
variation at the first base of a codon creates a 
population containing amino acids from the same column 
of the genetic code table (as shown in the Table 3-6 on 
p87 of WATS87); Vtiriation at the second base of the 
codon creates a population containing amino acids froa 
the sane row of the genetic code table; variation at 
the third base of the codon creates a population 
containing amino acids from the same box. If two or 
three bases in the Ga.T.e codon are varied, t^.e p.=ittern 
is more cor.pl icatcd . Work with 3D protein structural 
models may suggest definite sets of amino acids to 
substitute at a given residue, but the method of 
variation may require cither more or fever kinds of 
amino acids be included- For exar.ple, examination of a 
model might suggest substitution of N or Q at a given 
residue. Co.-nbinatorial variation of codons requires 
that nixing il and Q at one Iccation also include K and 
H as possibilities at the same residue. One nuGt 
choose to put: 1) N only, 2) Q only, or 3) a mixture c: 
K, H, and Q. The present invention does not rely on 
accurate predictions of which amino acids should be 
placed at each residue, rather attention is focused en 
which residues should be varied. 

There are many ways to generate diversity in a 
protein. (See RICHB6, CARU35, and CLIP86.) One extreme 



case is th.ic one or a fev rc:3idue3 of t^c protein are 
varied as much as possible f • ntqr alia see CAKU85, 
CARU87, RICM36. and WHAP.B6) . We will call this limit 
"Focused Mutagenesis". Focused Mutagenesis is 

appropriate when the IPDO or other PPBD shows little or 
no binding to the target, as at the beginning of the 
search for a protein to bind to a now target material. 
When there is no binding botvcen the PPDO and the 
target, we preferably* pick a set of five to seven 
residues on the surface and vary each through all .20 
possibil it ies . 

An alternative plan of mutagenesis ("Diffuse 
Mutagenesis") that may be useful is to vary ::iany more 
residues through a more limited set of choices (See 
Vershon et aK., Chl5 of It.*OU36 and PAKU86) . This can 
be accomplished by spi>:ing each of the pure nts 
activated for D::a synthesis f e . a . nt-phosphoran id ites) 
with a s.Tiall anount of one or cere of the other 
activated nts. Contrary to general practice, the 
present invention sets the level of spiking so that 
only a sr.all percentage ( l\ to ,O0C01\, for exaople ) 
of the final product '-ill contain the initial Df/A 
sequence. This will insure that nany single, double, 
triple, and higher nutations occur, but that recovery 
of the basic sequenco will be a possible outcone. Let 
»b be the nur.ber of bases to be varied, and let Q be 
the fraction of all sequences that should have the 
parental sequence, then H, the fraction of the nixturc 
that is the najority component, is 

M = exp( log^(Q)/i:t3 ) = 10 ClogioCQ)/J-*b) ^ 

If, for exaniplc, thirty base pairs on the DNA 
chain were to be varied and It of the product is to 



have the parental sequence. then each mixed nt 
substrate should contain 86t of the parental nt and 14t 
of other nts. Table 8 shows the fraction (fn) of OKA 
aioleculcs having n non-parental bases when 30 bases are 
synthesized with reagents that contain fraction M of 
the majority conponcnt. When M=. 63096. f24 and higher 
are less than 10"^. The entry "niost" in Table 8 is the 
number of changes that has the highest probability. 
:iote that substantial probability for multiple 
substitutions only occurs if the fraction of parental 

S^equtinCe ( CO) iS ul lowej uO uXO^ a tuunu lu - . 

Mutagenesis of this sort can be applied to any part of 
the protein at any tine, but is most appropriate when 
some binding to the target has been established. The 
rij, base pairs of the DMA chain that are synthesized 
with mixed reagents need not be contiguous. They are 
picked r;o that between and Nfc codons are affected 

to various degrees. The residues picked Cor mutation 
are pickod witli reTerence to the 3D structure of the 
IF30, if V-.novn. For cxar.ple. one might pick all or 
.•zost of the residues in the principal and secondary 
set. We nay impose restrictions on the extent of 
variation at each of these residues based on homologous 
sequences or other data. The mixture of non-parental 
nts need not be random, rather mixtures can be biased 
to give particular amino acid types specific 
probabilities of appearance at each cndon. For 
example, one residue nay contain a hydrophobic amino 
acid in all known homologous sequences; in such a case, 
the first and third base of that codon would be varied, 
but the second would be set to T. Other examples of 
how this might be done will be given in the Detailed 
F.xamplc. This diffuse structure-directed mutagenesis 
will rovc;al the subtle changes possible in protein 
backbone associated with cont;crvative interior changes. 
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sucn as V to I, as wgLI i;or.o r.oz so suhcio chanc/cs 
that require conconLCiint chanqos it two or xorc 
residues of the protein. 
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5 For Foc'-scd Mutagenesis, ve nov consider the 

distribution of nts that will bo insertr^d at each 
variegated codon. Each codon could be prograrrned 
differently. If we have no infcr.-ation indicating that 
a particular ar. ino acid or cUiss of ar.ir.o acid is 
10 appropriate, ve strive to substitute all -i.-^ino ^cids 

with equal probability because representation of one ^i^'^' 
pbd above the detectable level is wasteful. Equal 
ar.ounts of all four nts at each pccition in a codon ^- 
yields the anino acid distribution: 



4/6< A 2/6A C 2/64 D 2/64 E 2/64 F 4/64 C 

2/64 H 3/64 I 2/64 K 6/64 L 1/64 M 2/64 U ^ 

4/64 P 2/64 Q 6/64 R 6/64 S 4/04 T 4/6-^ V 

1/64 W 2/64 Y :/64 stop 

^0 r • . 

This distribution has the disadvantaqe of qi^-ing two jjf'-.y 

basic residues for every acidic residue. In addition, ^^C' 

six tir.es as nuch R, S, and L as w or M 'cccur. If rive j^-r;-.'.' 

codons " are synthesized with tJiis d i :; t r i bu t ion , fe,*,;^ 

2 5 sequences enccd ing f ive Rs arc 7 7 76-t :dos r.ore abundant • p'^V 

,than sequences encoding five Ws. To h.^ve w-vv-vr-w-w V i 

present at detectable levels, ve .-ust have R-P.-S-R-R f-* : 

present in 7776-fold excess. 

*■ ' 

& 

30 Consider the distribution of er.ir.o acids encoded ^^r. 

by one codcn in a population of vgC;:A. Lot Abun(x) be ^"j'; 

the abundance of D::A sequences coding for a.r. ino acid x; l-^ 

AbunCx) is uniq^joly defined by tho distributicn of nts p\ 
at each base of the codon. For any distribution, there 

35 will be. a r.ost-favorcd amino acid (.-faa) wi^h abundance Lvr, 
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A cor.puter program, written as part of the present 
invention and named "Find Cptirun vgCodon" (Sue Table 
9), varies the composition at baces 1 and 2, in stcpc 
of 0.05, and reports the co.ripos i t ion that gives the 
largest vnluc of the quantity ( f Abun(lfaa)/Abun(nf.ia) 
30 (l-Abun(stop) ) ) J . A vg . codon is symbolically defined 



1^3 

Abun(nfaa) and a least-favored arjino acid {Ifaaj with 
abundance Abun(lfaa). We seek the nt distribution that 
allows ail r.wenty amino acids' and that yields the 
largest ratio Abun { 1 f aa ) /Abun (a f aa ) subject to two ^\ 
5 constraints. First, the abundances of acidic and basic L 
amino acidc should be equal lest we bias the PBDs 
toward a particular charge. Second, the nu-ber of stop 
codons should be kept as low as possible. Thus only nt 
distributions that yield Abun(E)-Abun(D) 
10 Abun (R) t-Abun ( K) are considered, and the function 
maximized is: 



{ { 1-Abun (stop) ) (Abun(l faa)/AbunCrafaa) ) ) . 



We have simplified the search for an optimal nt ^ 
distribution by limiting the third base to T or G; C or 
G at the third base would be ctpiivalent. All amino i' 
acids are possible and the nu-.'^cr cf accessible stop r- 
codons is reduced because VGA ,3nd TAA codons ar** 
elminatod. The ar.ino acids F, V, C, H, N, I, and 0 I'- 
require T nt the third base while w, m, Q, K, and K 
require C. Thus we use an equir.olar mixture of T and G f 
at the third base. 



by the nt distribution at each base: 




base 1 3 



t3 



c3 



g3 



tl + ci+al + gl = 1.0 
t2 + c2 + a2 + g2 = 1.0 
t3 = g3 = 0.5, c3 = a3 



The variation of the quantities tl, cl, al, gi, t2, c2 , 
a2, and g2 is subject to the constraint that 
Abun(E)+Abun(D) equals Abun ( K) + Abur ( R) ; 

AbunCE)-^Abun(D) = gl*a2 

Abun(K) +Abun(R) = al*a2/2 + cl*g2 + al*q2/2 
gl*a2 = al*a2/2 + cl*g2 + al*g2/2 



Solving for g2, we obtain 

g2 = (gl*a2 - ,0. 5*a 1 *a2 ) / (cl 0.5*al) 



In addition, 

tl = 1 - al - cl - gl 
t2' = 1 - a2 - C2 - g2 

We vary al, cl, gl, a2, and c2 and then calculate tl, 
g2, and t2. Initially, variation is in steps of 5%. 
Once an approximately optinun distribution of nts is 
determined, the region is further explored vith steps 
of 1%. The Logic of this progran is shown in Table 9. 
The optimum distribution is: 
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Cntiriir. vaCodon 



base il ~ 
base S2 =■ 
base ?3 = 



0.26 
0.22 
0.5 



0.13 
0.16 
0.0 



0.26 
0.40 
0.0 



0. 30 
0.22 
0.5 



and yields DN'A r.olecules enccdlng each typo araino acid 
with the abundances sr.own in Table 10. 



as Che Miliigen 7500, can be progran.ned to synthesize 
any base of an oligo-nt with any distribution of nts by 
taking some nt substrates f e . a . nt phosphoranidites) 
from each of two or more reservoirs. Alternatively, nt 
substrates can be niiced in ar.y ratios a.-id placed in one 
of the extra reservoir for so called "dirty bottle*' 
synthesis. Either of these methods ar.cunts to 
specifying the nt distribution. The actual nt 
distribution obtained will differ froa the specified nt 
distribution due to several causes, including: a) 
differential inherent reactivity of nt substrates, and 
b) differential deterioration of reagents. It is 
possible to cor.pensate partially for these effects, but 
some residual error will occur. We denote the average 
discrepancy betveen spec i .'led and obser-zcd nt fraction 
as Serr' 

Serr = square root ( averager (fobs ' fspecJ/^spec I J 
is the a.T.ount of one type of nt found at a 



fgpec the ar.ount of that type of nt that 



35 



^ece f^bs 
base and 

was specif iec*. at the sare base. The average is over 
all specif iei types of nts and over a nur.bcr f e.g. 10 
or 20) different variegated bases. By hypothesis, the 
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actual nc distribution at a variogaced base will be 
within S\ of the specified distribution. Actual DNA 
synthesizers and ONA synthetic chenistry nay have 
different error levels. It is the user's 

responsibility to deteraine ^err ^^'^ 
synthesiter and chenistry c-ployed by the user. 



C. 




To dcternine the possible efrects of errors in nt 
conposition on the anino-acid distribution, we modified 
10 the program "Find Optinu.-a vgCcdon" in four ways: 



1) thfc fraction of each nt in the first two bases 
is allowed to vary frcn its optinu=» value tines (1 
- S^j-j.) to the optir.ur: value times (1 S^j-j.) in 
seven equal steps i^err hypothetical 
fractional error level entered by the user) ; the 
sun of nt fractions at one base always equals 1.0, 
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2) g2 is varied in the sane nanner as a2, i.e. we 
dropped the restriction that Abun ( U) *Abun ( = 
Abun( K) +Abun( R) , 

3) t3 and g3 are varied from 0.5.ti-es (1 - S^j.^) 
to 0-5 times (1 + Sg^^) in three equal steps, 

4) the snal lest racio Abun { 1 f aa ) /Abun ( mf aa ) is 
sounht. 

In actual experiments, ve --ill direct the synthesizer 
to produce the opti-un c::a distribution "Optinura 
vqCodon" qivon above. Inzcr.plete control over DNA 
chemistry may, hcvovor, cause us to actually obtain the 
following distribution that is the worst that can be 
obtained if all nt fractions are within 5\ of the 
amounts specified in " Optimum vgCodon". A 
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corresponding table can be cilcaiaced for any glv-n 
Serr "sing the program "Find worst vgCodcn within Serr 
of given distribution." given in Table 11. 

optimum voCodon. worst 5^ '^rrcrs 



base 3i = 
base r2 = 
10 base 53 « 



0.251 0.139 0.273 0.287 
0.209 0.160 0.400 0.231 
0.475 0.0 0.0 0.525 



This distribution yields DWA encoding diCferenc 
amino acids at the abundances shown in Table 12. 

15 If five codons arc synthesized with reagents nixod 

so as .to produce the nt-distr ibuticn "Optir.utr. vgCodon". 
and if we actually obtained the nt-distritu tic.-i 
"Cptimun vgCodon, worst 5^ errors", then CNA sequences 
encoding the nfaa <\t all of the five codons are atcur 

2 0 27 7 tir.es as likely as sequences encoding the It it 

at all of the five codcns; about 2-;^ ni the 2::.\ 
sequences- will have a stop codon in one or -ore of the 
five codons. 

25 - When five cbdons are synthesized using euuin^iar 
mixtures at bases 1 and 2, (Abun (nf aa )/Atun f 1 f£a) ) ^ = 
7776; If we progran the optir.un nt discributicn an:l 
come within 51, then ( Abun (r.f aa ) /Abun ( 1 f aa) ) ^ « 2*- 7. 
The total number of different FDDs is unchanged, tut 

30 the least-favored sequence is about 23 tires .ncre 
abundant. Detecting the least- favored anino-acid 
sequence when varying four residues with equinolar r.ts 
at each varied base requires as sensitive a separation 
system as dees detecting the 1 cast- favored anino-acid 
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sequence when varying five residues with Che optinized 
nt distribution. 

By hypothesis, the distribution "Optimal vgCodon" 
is used in the second version cf the second variegation tl'..''-.:^ 
of hypothetical example 2.. The abundance of the DNA ^-•.>*^ 
encoding each typo of amino acid is, however, taken 
from the Table 12. The /ifcundancc of ON' A encoding the 

parental amino acid sequence is: t '''^ 



Amount (parental seq.) 

F24 G30 D34 E4 2 T47 

= Abun(F) * Abun(C) * Abun(D) * Abun(E) * Abun(T) 
= .0249 X .0663 X .0545 x .0602 X .0437 
= 2.4 X lO""' 



1 

Therefore, DHA encoding the PP3U sequence as well as } '^X'^i 

very many related sequences will be present in E "^'^^ 

sufficient quantity to be detected and we are assured f*'"^^ 

that the process will be progressive. v ; ^5 

We use the following procedure to doter^iine £~- -r^'^J 

whether a given level of variegation is practical: r-- --- 

1) from: a) the intended nt-d is t ribut ion at each 

base of a variegated codcn, and b) S^j-j- (the error ( " ■--.^ 

level in mixed DliA synthesis), calculate the k- " .1 ' 

abundances of DN'A sequences coding for e^ich araiiic ^^fc*^ 

acid and stop, yTcv*: 

2) calculate the abundance of Di.'A encoding the 1* ."..'Z' 
PPBD sequence by multiplying the abundances of th** p*-- 
parental amino acid at each variegated residue, ^ 
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10 



15 



The abundances used in the procedure above are 
calculated from the worst distribution that is within 
^err °f specified distribution. A variegation that 

ensures that the PPOO sequence can be recovered is 
practical, PPDD can be recovered if the abundance of 
PPBD-encoding Dt/A is larger than both l/Mp^v 
^/^sensi' Preferably, the abundance of PPBD-encod :ng 
DNA is 3 to 10 titr.cs higher than boch l/M^itv 
^/*^sensi -o provide a nargin of redundancy. i^nzv is 
the nurier of transfomants that can be oade frora Vq^qo 
D-'^^. Vith current tech-.olog/ nntv ij approximately 3 x 
10^, but the exact value depends on the details of the 
procedures adapted by the user. Improvements in 
technology that allow nore efficient: a) synthesis of 
DMA, b) ligation of D::a, or c) t ransf onaat ion of cells 
will raise the value of M^tv- <^sensi is the 
sensitivity of the affinity separation; improvements in 



affinity separation will raise Ce 



If the sr.allor 



of Mn 



and Csensi 



is increased, higher levels of 



20 variegation may be used. For example, if 



is 1 



in 10^ and M„ 



"'ntv then improvements in Cs^pj^^ are 

less valuable than i-provcr.ents in M^^v 

A level of variegation thit allows recovery of the 
2 5 -PPBD has two properties: 



1) we can not regress because 
ava i lable, 



Lhe PPEO is 



30 2) an enor.T.ous nunber of .nultiple changes related 

to the PPBD are available for selection And we are 
able to detect and benefit fron these changes. 

rt is very unli>:ely that all of the variants will 
3 5 be worse than the PPBO; we req-jire the presence ot PPBD 
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at detectable levels to insure that all the sequences 
present are in-Jced related to PPBO. 

The user must adjust the list of residues to be 
varied and levels of varieqa'-.ion at each residue until 
the calculated varicqation is within the bounds sot by 



^ntv 



and Csensi- 



10 



15 



20 



Preferably, we also consider the interactions 
between the sices of varicqation and the surroundinq 
DMA. If. the method of nutaqenesis to be used is 
replacenent of a cassette, we consider whether the 
varieqation will qenerate qratuitous restriction sites 
and whether they seriously interfere with the intended 
introduction of diversity. We reduce or elininate 
qratuitous restriction sites by appropriate choice of 
varieqation pattern and silent alteration of codcns 
neiqhborinq. the sites of varieqation. See the Detailed 
Exanple. 

Sec. 14.1: rn?crtion of synthetic ycCNA into — i 



25 



In the case of cassette mu taqenes i s , the 
restriction sites that were introduced vhen the qene 
for the inserted dona in was synthesized are used to 
introduce the synthetic vqDN'A into a plasrtid or ct!\er 
OCV. Restriction diqcstions and liqations are 
perfonried by standard -ethods , ( AUSUe? ) . 

In the case of s ing le-stranded-ol iqonuc leot ide- 
directed -utaqenesis, synthetic vqCtiA is used to crea!:e 
diversity in the vector (D0T535). 



3 5 sec. 14.2: Trans fo rr^t i on of cells: 
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The pres'^nC invencicn is not ii::.iccd to any one 
ciethod of transrorninq colLs with DNA. The fol lowing 
procedure is a modi ficaC ion of that of Maniatis (p250, 
5 KANI62). This procedure is only one example of how the 
necessary transformations, nay be perfomed.. The 
procedure produc-s approximately (7^/25) x lo' or nore 
trans formants. The user picks a value for '/q, c.^.e 
initial volume of the cell culture, to provide the 
10 desired number of trans for::iants . All water is triple 
distilJed and is treated with activated charcoal for 24 
hours . 

1) culture col i in nl of LB broth at 37°C until 
15 cell density reaches 5 x lo"' to 7 x lo"^ cells/ml, 

2) chili on ice lor 65 riinute?;, centrifuqc the cell 
suspension at 4000g for 5 r.inutes at 4*^C, 

20 3) discard supornatar.t ; rcsuspcnd the cells in nl 
of an ice-cold, sterile solution of CO r.". CaCl2, 

4) chill on ice for 15 ninutes, and then centrifuge at 
4000g for 5 ninutes at ^^C, 



•5) discard supernatant; rojuspcnd cells in 2 x 
ml of ice-cold, sterile 60 .t-X CaCH: store cells at 
4°C for fro3 10 ninutos to 24 hours; transformation 
efficiency increases by about 4-fold in the first 24 
hours and then returns to the original value. 

6) add CNA in ligation or TE buffer to Vj,/250 r.l of 
cells; nix and store cn ice for 30 ninutes. 
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1) heat shock cells At -:2°C for an appropriate amount 
of time, 

S) add V^/2S ul LD broth and incubate at il^C for 1 
5 hour, 

9) plate cells on LU aaar containing antibiotic, 



10 



15 



20 



25 
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10) harvest CPs in appropriate manner. 

It is not necessary to isolate transforr.ed cells 
between transformation and affinity separacion. We 
prefer to have transformed cells at high concentration 
so that thoy can be plated densely on relatively few 
plates. For this purpose, steps (9) and (10) nay be 
replaced with a procedure in which the cells in step 
(8) are further diluted with LB broth and the 
selecting antibiotic is added. In the case of 
ai^picillin, lysis of sensitive colls occurs, and 
resista.Tt cells arc enriched by centr i f uga t ion at 2 to 
3 h after addition of antibiotic. 

One routinely obtains between 10~ and 5 x 10^ 
transfomants/uq of CCC Dr:A. Ligation efficiency 
ra.nges fom 0 . IX for blunt-blunt insertions, to as 
nuch as 151 for st ick.y-s t icKy insertions. For large 
transformations, it r.ay be desirable to purify DNA 
betw*icn ligation -ind tranc format ion because unligatcd 
DNA is thought to compete with CCC DNA for entry into 
the competent cells. Only a small fraction of cells 
are competent, typically 0.1\. The heat shock has 
been optimized for transformation reactions carried 
out in a volume of 200 ul in a plastic Eppendcrf tube; 
optimizing this step for larger volumes is possible. 
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5 e c. 1 5.: IsoU^ ti on o f CJILE ^.n>s with bindi no-to- 

taroet nhorocvnes : 



This procc:;ure requires up to 2uq m/A per 10 
transfonnants. 

Sec. 14. 3r nrovth '^f rPfvnP8D1 copulation : L'-;\.- 

■* 

The transformed cells arc grown first under non- l^'^!^:- 
selective conditions that allou expression of plasnid 

qenes and then solecccd to kill ur.trans f orr.ed cells. t^-' :V 
Transformed cells are cr.en induced to express tr.c nr.p- 

10 ph d gene at the appropriate level of induction, as Sr-X'V 

determined in Sec. 10. 1. The CPs carrying the IPDD ^/vrj" 

are harvested by a met.^od appropriate to the package. ^ •'"*'"" 

A high level of diversity can be generated by in 

15 V itro variegated synthesis of c:.'A and this diversity b.--!.' 
can .be naintained passively through several 
generations in an org.misn without positive selective 
pressure. . Loss or reduction in frequency of 
deleterious nutations is advantageous for the purpose.^ 

20 of the present invent ic.-.. As ve do' not knov how one [''''■ 

.-night press Z± Cl^JLk o* *^^V other Kind of Tell to t;- ; ,* 

actively -aintain diversity, vo specify that the vgiJNA ^; v 

r.ust be used to prepare plasnids, that the plasmids ^ 

are used to trinsfor:?. cells, and that the selection j-'-...";; 

2 5 Just be perfomcd h'-tfore rcre than a cev generations [l ' 

elapse. Moreover, subdividing the variegat*jd ^, ' - 

population before arplif icit icn in an orncnis.n by j . 

renoving a snail Si-'le ( lc=s .than lOfc) for further i. 
vorK would result in loss o£ diversity; therefore, one 

30 should use all or rest of the synthetic ONA and nost T ' ? 

or all of the trans for.-cd cells. I , -; 
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The ha r-.' OS cod paci'.aqe-j oro r.ov enriched for the 
binding-to-carqcc phenotypo by use ot affinity 
separation involvir.cj the target naterial ir-obilized 
on an affinity matrix. Packatjes that fail to bind to 
5 the target aaterial are uashod away. If the packages 
are bacter ic?ha<;o or cr.dospores, it nay be desirable 
to include a b3Ctc r ioc i da 1 acjcnt, such as azide, in 
the buffer to prevent bactcri.il growth. The buffers ^V^jV 
ussd in cnro.-d^cg raphy -ust include: a; any ions or K.-.-.l; 
10 other solutes needed to stabilize the target, and bl 
any ions or other solutes necd'^d to stabilize the PDDs 
derived free the IPDD. p 

Sec. 15. I: Attaching the tarrrfj*: -aterial to a colur.n: ^C; ' ' 

15 y^-/.- 

Affinity colur:n chrocatography is the preferred 

--J 

cethpd of affinity separation, but other a^ffinity C" 
separation r.ethods r.ay be used. A variety of ^; 
connercially available support materials f.^r affinity |. . 

20 chroj-.atography are used. These include derivatized I ^ 



beads to '-hich the target naterial is covalently 
linked, or ■ non-dor i vat i zed r.atcrial to vhich the 
target naterial adheres irreversibly. 

25 , Suppliers of support naterial for affinity 

chromatography include: Applied Protein Tecnnolcgiof; \^ 
Caobridge, y.\; Bio-Ri:d Uibcrator ies, Rockville Center, 
nV; Pierce Cher.ical Ccr.pany, R-'Ckford, IL. Target 
ir.aterials are attached to the natrix in accord with " 

30 the directions of the r.anu factu rer of each natrix r 
preparation with consideration of good presentation of ^ 
the target. 

Sec. 15.2: P:2d'jcim gplocticn dwe to non-sneci f ic [ 
3 5 bind i nq : I 



10 



20 



155 



We reduce non-specific binding of CP(PBD)e to the 
natrix that beers the target in two ways: 

1) we treat the coIuDn with blocking agents such 
as genetically defective CPs cr a solution of 
protein before the population of C?(vqpBD)s is 

chronatographed , and . 

r. 

2) we pass the population of C?(vgPBD)s over a fif-;- 
matrix containing no target or a different target 
^rcn: the sarie 



class as t;iG actUai t^rgtit prior co 

affinity chrona tography . f.-": 

t"- J 
u\ ■ 

15 Step (1) above saturates any non-specific binding that 
the affinity r:atrix night show toward wild-type CPs or 
proteins in general; step (2) rer.oves co.Tiponents of 
our population that exhibit non-specific binding to 



the natrix or to polecules of the sane class as the 



target. If the target were h3r3o heart nyoglobin, for ^ 

exanple, a colunn supporting bovine serun aibur:in jk ' 

could be used to trap CPs exhibiting PBOs with strong (y^ 

non-specific binding to proteins. If cholesterol were h- 

the target, then a hydrophobic conipo'jnd, such as p- |. 

25 ^ tertiarybutylbeneyl alcohol, could be used to remove }-• :■ 

CPs displaying PBDs having strong non-specific binding f • 

to hydrophobic compounds. It Is anticipated thac P30s [ . 

that fail to fold or that are prenaturely teminated [*. 

will be non-spec i C ica I ly sticky. These seq^jences |*. 

30 could outnur.ber the FDDs having desirable binding j*. 

properties. Thus, the capacity of the initial colunn r ' 

that renovcs ind iccr ir. ina te ly adhesive PGDs should be i 

greater f c .g. S fold greater) than the column that f 

supports the target r.olcculc. ji.. 

3 5 l\- 
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Variation in the support material (polystyrene, 
glass, agarose, cellulose, etc . 1 in analysis of clones 
carrying GLJDs is used to elininate enrichment for 
packages that bind'to the support material rather than 
the' target . 



Sec. 15. 3: 



K'lutina tho colurtn: 



To • separate the CP(paO)s that carry PBDs that 
10 sho-J actual binding' to the target from CP(PBD)s that 
carry PBDs that do not actually show binding to the 
target, the population of CPs is appl ied to an 
affinity matrix under conditions compatible with the 
intended use of the binding protein and the population 
ir is fractionated by passage of a gradient of sone 
solute over the column. The process enriches for PBDs 
having affinity for the target and for uhich the 
affinity for the target is least affected by the 
eluonts used- ^ The enriched fractions are those 
20 containing viable CPs that elutc frc-n the column at 
greater concent ration of the eluant. 



Any ions or cofactors needed for stability of 
PBDs (derived from IPDO) or target must be included in 

25 "initial and elution buffers at appropriate levels. We 
first remove G?(PBD)s that do not bind the target by 
washing thG natrix vith the volume of the initial 
buffer required to bring the optical density (at 260 
nn or 280 nn) back to base line plus one void. volume 

30 (Vy) , but not more than 5 v^. The column is then 
eluted with a gradient of increasing: a) salt, b) [H+] 
(decreasing pH) , c) neutral solutes, d) temperature 
(increasing or decreasing), or e) some combination of 
these factors. The solutes in each of the first three 

35 gradients have been found generally to weaken non- 
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cov^ lent 


inter^.cc ions betveen proteins and bound 




molecules 


Sale is the 


rnost preferred solute for 




gradient 


fomation in nost 


cases. Other 


solutes that 




generally 


weaken non-covalent interaction betveen 


5 


protoii^.s and bojnd r.olecules nay also be 


used, "Salt" 




includes 


solutions containing any or 


all of the 




f ol lovi ng 


ionic species: 






10 


Na^ 


K- 


Ca* + 


Mg-^ 






Li* 


Sr+ + 


Ba + + 








Cl - 


Br- 


15 












HSO.- . 


PO4 — 


HPO4-- 






CO3-- 


HCO3- 


Acotate 


20 


Cicrate 


Standard 1- 


Standa rd 


Cuanidin ium 






Ariir.o Acids 


nucleotides 


Cl 



25 



30 



35 



ether ionic cr neutral solutes nay be used. All 
solutes are subject to the necessity that they not 
kill the genetic pjckages. Because bacteria continue 
to r.ecabolitc d-ring affinity separation, the choice 
of buffer cor.coi^cnts is r.ore restricted for bacteria 
than for bactericphace or spores. Neutral solutes, 
such as ethanol. acetone, ether, or iirea, are 
frequently used in protein purification and are known 
to weaken non-ccvalent interactions between ' prote ins 
and other ccleculcs. Kany of these species are, 
however, very- hir-ful tc bacteria and bacteriophage. 
B.Acterial spores, cn the other hand, are inpervicus to 
r.ost neutral sclutes. Several passes aay be nade 
through the steps in Sec. 15. Different solutes =iay 
be used in diiforent analyses, salt in one, pH in the 
next. ere . 
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qenetic material is essential. If cells, spores, or 
virions bind irreversibly to the natrix but <ire not 
killed, we can recover the information throuch situ 
cell division, germination, or infection respectively. 
5 Proteolytic degradation of the packages and recovery 
of DNA is not preferred. 

Although degradation of the bound .CPs and 
recovery of genetic material is a possible mode of 

10 operation, inadvertent inactivation of the CPs is very 
deleterious. It is preferred that maxiraun limits for 
solutes that do not inactivate the CPs or denature the 
target or the column are determined. If the affinity 
matrices are expendable, one may use conditions that 

15 denature the column to elute CPs; before" the target is 
denatured, a portion of the affinity matrix should be 
removed for possible use as an inoculum. As the CPs 
are held together by protein-protein interactions and 
other non-covalent molecular interactions, there will 
2C be cases in which the molecular package --ill bind so 
tightly to the target molecules on the affinity matrix 
that the CPs can not bo washed off in viable form. 
This will .only occur when very tight binding has been 
obtained. In these cases, methods (3) through (5) 
25 above can be used to obtain the bound packages or the 
genetic messages from the affinity matrix. 

It is possible, by manipulation of the elution 
conditions, to isolate SDOs that bind to the target at 

30 one pH (pH^) but not at another ptl {pHol * The 
population is applied at pH^ and the colur.n is washed 
thoroughly at pHj^. The column is then clutcd with 
buffer at pHq and CPs that come off at the new pH are 
collected and cultured. Similar procedures may be 

35 used for other solution parameters. such as 
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temperature. For example, Ct'(vgp3D)s could be applied 
to a coluan supporting insu 1 in . After eluting with 
salt to remove CPs with little or no binding to 
insulin, we elute with salt and glucose to liberate 
CPs that display P£Ds that bind insulin or glucose in 
a competitive manner. 



Sec. IS. 5: 



Anplifvina the Fnriched PacKaoes 



Viable CPs having the selected binding trait are 
amplified by culture in a suitable medium, or, in the 
case of phage, infection into a host so cultivated. 
If the CPs have been inactivated by the 
chromatography, the OCV carrying the osp - pbd gene must 
be recovered frcm the CP, and introduced into a new, 
viable host. 



Sec. 15 
needed : 



Qqt err.ininrr whether furtn^r enrichment is 



The probability of isolating a C? with improved 
binding increases by Cgff with each' separation cycle- 
Let 1/ be the number of distinct amino-acid sequences 
produced by the var iegat ion . Wo wont to perform K 
-separation cycles before attempting to isolate an SBD, 
where K is such that the probability of isolating a 
single sao is 0.10 or higher. 

K = the smallest integcr>= log^ot^-^^ f^)/l-09io^^ef f ^ 

For exar.ple, if U were 1.0 x 10^ and C^ff - 
6.31 X 102, then loglO(1.0 x 106) / log 10 { 6 . 3 1 x 102) = 
6.0000/2.8000 " 2.1-;. Therefore we would attempt to 
isolate SSDs after the third separation cycle. After 
only two separation cycles, the probability of finding 
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an SDO is (6.31 x. 10^ ) '^Z ( 1 . 0 x lo') = .O-l 
attempting to icoLate SBOs niqht be profitable. 



ond 



Clonal isolates from the last fraction eluted in 
5 Sec. 15.3 containing any viable CPs, as well as clonal 
isolates pbtaincd by culturing an inoculun taken fron 
the a C f inity * matrix , are cultured in a growth step 
that is sinilar to that described in Sec. 14.3. If K 
separation cycles have been cc-plctcd, saniples fron a 

10 number, e ,q . 32, of* these clonal isolates are tested 
for elution properties on the (target) column. If 
none of the isolated .. genet i c.i.Hy pure CPs show 
improved binding to target, or if K cycles have not 
yet been completed, then we pool and culture, in. a 

15 manner sinilar to the manner sec forth in SEc. 14.3, 
the CPs from the last few fractions eluted (see Sec. 
15.4) that contained viable CPs and from the CPs 
obtained by culturir.q an incL:ulum taV:en from the 
column -matrix. We then repeat the enrichment 

20 procedure described in Sec. IS. This cyclic 
enrichment nay continue ^chrom Passes or ur.til an S3D 
is isolated. 

If one or more of the isolated CPs has improved 
25 retention on the (target) column, wo deternine whether 
- the retention of the candidate :>HDs is due to affinity 
for. the target materiel as follows. A second column 
is prepared using a different support matrix with the 
target material bound at the optimal density. The 
30 elution volumes, under the same elution conditions as 
used previously (see Sec, 15.3), of candidate CP(S0O)s 
are co.-npared to each other and to CP(PP3D of this 
round). If one or more candidate CP(SBD)s has a 
larger elution volu.T.e than CP(PPBD of this round) , 
35 then we pick the CP(SQO) having tiie highest elution 
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volunc and proceed to characterize t:ne population (see 
Sec. 15.7). If none of the candidate C?(S50)s has 
higher elution volune than CP(?P30 of this round), 
then ve pool -and culture, in a manner si.-ailar to the 
nanner used previously (Sec. 15.3). the CPs frcn the 
last fev fractions that contained viable CPs and the 
CPs obtained by culturing an inoculu.-n taken fron the 
colunn r.atrix. v:e then repeat the enrichment 

procedure of Sec. IS. 

If all of the S30s show bindino that is superior 
to PPBO of this round, '-e pool and culture the CPs 
from the last fraction that contains viable CPs and 



fron the inoculua taken f ro= the colucn. This f; 
population is rc-chrc-atoqraphcd at least one pass to t': 



fractionate further the CPs based cn K^j- V- 

(; ; 

If an R:;a phdfje '-ere u:;cd as CP, the FJJA -culd \- 
either be cultured vith the ac::istancc of a helper t, 
20 pi-.a!9c or ce reverse tr.'.nscr ihed ar.d the rsUA arplificd, -v 
The amplified C:.'A could thnr. be sequor.cea or sutclcncd 
into suitable plasnidi. ^ 

Sec. 15.7: Chi* r.^ct-^r : .t i r:n th e rcr? y I at ion : LT"- 

2 5 

We characterize .-c-bcrs of the -'Cpulaticn shcvin'^ 
desired binding properties oy genetic ^nd biochemical ( 
nethods. We obtain clonal isolates and test these 
strains by genetic and affinity methods to determine 
30 genotype and phcnotype '-ith respect to binding to ^ 
target. For several genetically pure isolates that f*:. 

r.- - 

Show binding, ve dc-o.-.i-.trate that the binding is 
caused by the artificial chimeric gone by e:<cising the 



oro-chd gene, and crossing it into the parental CP. Wc t - 

also ligate the deleted backbone of each CP frcr. which K. '*•"• 




. • ■. ;.;^_.„.v^: :>;:,•••>•--.:•*. y-vcC^^-^"^- - 



the osp-shd is removed and dczonstrace chat each 
bacJdDone alone cannot confer bindiny to the target on 
the OP. We sequence the osp-sbd gene fron several 
clonal isolates. Pri.'zers for sequencing are chosen 
5 fro.-a the D!fA Clanking the o sr-orbd gene or fron parts 
of the osn-pobd gene thac arc noc variegated. 

Sec. 15.8: Testing of binding ;^ffinit:y: 

10 • For one or moro clonal isolacej, ue subclone the 

3bd gene fragnient, without the osp fragnenc, into an 
expression veccor sucn thac each SBD can be produced 
as a free protein. Because numerous unique 

restriction sites were built into the inserted donain, 

15 it is easy' to subclone the gene at any Line. Each SBD 
protein is purified by normal neans, including 
affinity chror.atography . Physical r.easurenenc£ of the 
strength of binding are then r.ade on each free SBD 
protein by one of the follo'-irg nethods: 1} alteration 

20 of the Stokes radius as a function of binding of the 
target material, r.easured by characteristics of 
elution fron a molecular sizing column such as 
agarose, 2) retention of radiolabeled binding protein 
on a spun affinity colu::.n to -Jhicr. has been affixed 
25 ^the target nacerial, or 2) retention of radiolabeled 
target material on a spun affinity column to vhich has 
been affixed the binding protein. The r.easure.r.encs of 
binding -■ for each free SBD " are compared to the 
corresponding neasurer.cn ts of binding for thr. PP30. 

30 

In each assay, we measure the extent of binding 
as a function of concentration of each protein, and 
other relevant physical and chemical parameters such 
as salt concentration, temperature, pH, and prosthetic 
35 group concentrations (if any). 
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In uddition, the SBD with highest affinity for 
the target frora each round is cofr-pared to the best SBD 
of the previous round (IPBD for the first round) and 
to the IPSO (second and later rounds) with respect to 
affinity for the target material. Successive rounds 
of mutagenesis and se lect ion- thrcugh-bind ing yield 
increasing affinity until desired levels are achieved. 

If we find that the binding is not yet 
sufficient, we nust decide which residues to vary next 
(see Sec. 16.0). If the binding is sufficient, then 
we now have a expression vector bearing a gene 
encoding the desired novel binding protein. 

Sec. 15.9: Otlier Affinity Soortr-^tion ?-^cans: 

FACs .may be usod to separate CPs that bind 
fluorescent labeled target wi-ti the cpticized 
parameters determined in Part tl. We d iscr i.- inate 
against artifactual binding to the fluorescenr able by 
using two or more differerrt dyes, chosen to be 
structurally different. CPs isolated using target 
labeled with a first dye are cultured. These CPs are 
then tested with target labeled with a second dye. 

. E 1 ect ropho rc t i c affinity separation uses 
unaltered target so that only other ions in the buffer 
can give rise to artifactual binding, Artifactual 
biding to the gel material gives rise to retardation 
independent of field direction and so is easily 
eliminated. A variegated population of CPs will have 
a variety of charges. The following 20 

electrophoretic procedure accommodates this variation 
in the population. 
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First the Vtiricgatod population of CPs is 
elcctrophorcscd in a gel that contains no target 
material. The electrophoresis continues until the CP 
5 s are distributed along the length of the lane. The 
gels described Ly Sower for phage are very lov in 
agarose and lack ncchanical stability. The target- 
free lane in which the initial electrophoresis is 
conducted is separate frcn a square of gel chat 
10 contains target material by a removable baffle. After 
the first pass, the baffle is removed and a second 
electrophoresis is conducted at right angles to the 
first. CPs that do not bind target migrate vith 
unaltered nobility while CP s that do bind target uiH 
15 separate from the majority that do not bind target. A 
diagonal line of non-binding CPs win form. This line 
is excised and discarded. Other parts of the gel are 
dissolved and the CPs cultured. 

2 0 Soc. 16. Q: Tne N>:<t r i rao ion Cyc l op 

We now consider which residues of the P3D should 
be varied in the* next variegation cycle. The general 
rule is to preserve as much accu.-nulated infon.-ation as 

25 possible. If the level of variegation in the previous 
variegation cycle was corroctl/ chosen, then the amine 
acids selected to bo in the residues just varied are 
the ones best dotcrn i nod . The environment of other 
residues has changed, so that it is appropriate to 

30 vary then again. Uccause there are always more 
residues in the principal [Sec. 13. 1.1) and secondary 
sots (Sec. n.1.2) than can bo varied sisul tanecusly, 
we start by picking residues that either have never 
been varied (highest priority) or that have not been 

35 varied for one or more cycles. If we find that 
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varying all the residues except those varied in the 
previous cycle does not allov a hiqh enough level of 
diversity, then residues varied in the previous cycle 
might be varied acjain. For exacple, if Hj,^.^ (the 
nuriber of indej-endcnt t rans f orrnan ts that can be 
produced fron Yq^jq of ONA) and Cggnsi C^^e 
sensitivity of the atfinity separation) were such that 
seven residues could be varied, and if the principal 
and secondary sets contained 13 residues, we would 
always vary seven residues, even though that ir.plies 
varying some residue twice in a row. in such cases, 
we would pick the residues just varied that contain 
the amino acids of highest abundance in the variegated 
codons used. 

It is the accu.-nula ticn of information that allows 
the process to select those protein sequences that 
produce binding 'jotwcon the SDD and the target. Sone 
interfaces Lctvocn proteins and other niolecules 
involve twenty or nore residues. Ccnplete variation 
of twenty residues would generate lO^^ different 
proteins. By dividing the residues that lie close 
together in space into overlapping grcups of five to 
seven residues, we con vary a large surface but never 
- need to test rore than lo' to 10^ candidates at once. 



a savings of 10 



19 



to 10 



17 



fold. 



The power of 



30 



selection with jccur.u lat ion of inforr.ation is well 
illustrated in Chapter 3 of DAWK8 6. 

Hav ing p ickcd the res i dues to va ry , wo aga in set 
the ranqe of variegation for each residue according to 
the principles set forth in 13.2. design the vgDtiA 
encoding the desired mutants (Sec. 13.3), clone the 
vgDUA into CPs (Sec. 14), and solect-by-binding-to- 
target those CPs bearing SSDs (Sec. 15). 
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S^.C. 17.0: OTHPR CONS t DF.PAT rCN3 : 

Sec. 17.1: Joint selections: 

One may modify the affinity separation of the 
method described to select a molecule that binds to 
material A but not to material B. One needs to 
prepare' two selection colur.ns, one vLth material A and 
the other with material 8. The population of genetic 
packages is prepared in the manner described, but 
before applying the population to A, one passes the 
population over the D column so as to remove those 
members of the population that have high affinity for 
B ("reverse affinity chromatography"). In the 

preceding specification, the initial column supported 
some other molecule simply to remove CP(?SD)s that 
displayed. PBDs having indiscriminate affinity for 
surfaces. 

It may be necessary to anplify the population 
that does not bind to B before passing it over A. 
Amplification would most likely bo needed if A and B 
were . in some ways similar and the PPBD h^s been 
selected for having affinity for A. The optimum order 
of interactions night be dotcmined empirically. 

For example, to obtain an SOD that binds A but 
not B, three columns could be connected in series: a) 
a column supporting some compound, neither A nor B, or 
only the matrix material, b) a column supporting 3, 
and c) a column support ir.g A. A population of* 
CP(vgPBO)s is applied to the series of columns and the 
columns are washed with the buffer of constant ionic 
strength that is used in the application. The columns 



m 
S 



m 

m 

m 

m 
1 




0 



are unco;:pIed, and the 'tturd colunn is eluted wit.*n a 
gradient to isolate GP{PBD)g that bind A but not B. 
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. On** can «_lso generate molecules that bind to both 
A and B. In this case we can use a 30 model and 
Dutatc one face of the molecule in question to get 
binding to A. One can then mutate a different face 
to prcouce binding to E- When an SQO binds at least 
sor.cwn-4C to both A and B, one can mutate the chain by 
Oiffu::o Hutagencuis to refine the binding and use a 
sequential joint selection for binding to both A' and 

The nateriais A and B could bo proteins that 
differ -3t only one or a fcv residues. For example, A 
could be a natural protein for which the gene has been 
cloned and 8 could be a nutant of A that retains the 
overall 3D structure of A. SBDs selected to bind A 
but not 3 cust bind to A near the residues that are 
.•Lutacfed in D. If the r.utations were picked to be in 
the active site of A (assuming A has an active site), 
then an SBD that binds A but not D will bind to the 
active site of A and is likely to be an inhibitor of 
A. 

To obtain a protein that will bind to both A and 
we can, alternatively, first obtain in SBD that 
binds A and a different SDD that bind^; B. We can then 
cocbLne the genes encoding these doma.ns so that a 
tvo-doc^in single-polypcpt idc protein is produced. 
The fusion protein will have affinity for both A and B 
because one of its donains binds A and the other binds 
B. 




7 



V . C.) 




20 



17 0 l* 

K 

that are widely spaced so that we leave as little as ^ 
possible of the original surface unaltered. i^z. 



Destroying binding frequently requires only that 
a single amino acid in the binding interface be 
changed. If polyclonal antibodies are used, ue face 
the problem that all or nost of the strong epitopes 



Typically, polyclonal antibodies display a range 
of binding constants for antigen. Even if we have 
only polyclonal antibodies that bind to the 



1 



Dust be altered in a single molecule. Preferably, one 
would have a set of monoclcnal antibodies, or a nrarrow 

10 range of antibody species. If we had a series of 
monoclonal antibody columns, we could obtain one or 
more nutations that abolish binding to each monoclonal 
antibody. we could then combine some or all of these 
mutations in one molecule to produce a 

15 pharmacologically important protein recognized by none 

of the monoclonal antibodies. Such mutants must be fe^ 
tested to verify that the pharmacologically 
interesting properties have not be altered to an 

unacceptable degree by the mutations. i-:^ 
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pharmacologically i.-portant protein, we may proceed as 
25 follows. We engineer the pharmacologically important ^] 
protein to appear on the surface of a repllcable G?. 
-We introduce mutations into residues that arc on the 
surface of the pharmacologically important protein or 
into residues thought to be on the surface of the 
30 pharmacologically important protein so that a 
population of CPs is obtained. Polyclonal antibodies 
are attached to a column and the population of CPs is t- 
applied to the column at low salt. The column is 
o luted with .i t:alt gr«diont. The CPs that elute at ^ f) 

35 the lowest concentration of salt are those which bear 
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pharmacologically icporcant proteins that have been 
mutated in a way chat clininaces binding to the 
antibodies having naxinun affinity for the 
pharmacologically important protein. The CPs eluting 
at the lovest salt are isolated and cultured. The 
isolated SDD becomes the PPBD to further rounds of 
variegation so that the anti-genic detcrrainants are 
successively eliminated. 



Sec^ 



JJ. Pelection cf PBDs for retention of 



Let us take an SDD with known affinity for a 
target as PP30 to a variegation of a region of the PDO 
that is far from the residues that were varied to 
create the SSD. We can use the target as an affinity 
tr.olticule to select the PBDs that retain binding for 
the target, and that presumably retain the underlying 
structure of the IP5D. The variegations in this case 
could include Ir.sertior.s and deletions that are likely 
to disrupt the IP.9D structure. We could also use the 
IPBD and AfM(I?BD) in the sane way. 
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For example, if IPBD were 3?TI and AfM(2PTI) wore 
25 trypsin, we could introduce four or five addicicnal 
residue after residue 26 and select CPs that display 
* PBDs having specific nffinity for AfM(BrTI). Residue 
26 is chosen because it is in a turn and because it is 
about 2 5 A from K15, a key a-ino acid in binding to 
30 trypsin. 

The underlying :;tructure is most likely to be 
retained if insertions or dclcticns arc made at loops 
or turns. 
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Sec. 17.4: CrOf-^rorl in-Jinq pror.oins not un;qiiP>! 
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For each target, there are a iarqe number oC SBDs 
that may be found by the aethod of the present 
invention. The process relics on . a combination of 
protein structural considerations, probabilities, and 
targeted nutations with accumulation of information. 
To increase the probability that - some PDd in the 
population viU bind to the target, wo generate as 
large a papulation as ue can conveniently subject to 
solection-through-b inding in one experinent. Key 
questions in rcanagcnent of the method are "How many 
trans fomants can we produce?", and "How small a 
component can we find through selection-through- 
binding?". Geneticists routinely find nutations vith 
frequencies of one in 10^° using simple, powerful 
selections; wo e x p e r i ne n t a M y determine the 
sensitivity of cur procedure. The optimum level of 
variegation is deCemined by the maximum number of 
transfor-ants and the selection sensitivity, so that 
'or any reds.mable sensitivity we nay use a 
progressive process to obtain a series of proteins 
with higher and higher affinity for the chosen target 
material. cnr i ch:.:enrs of 1000-fold by a single pass 
of elution from an affinity plate have been 
demonstrated (Sm:T2S) . Three rounds of such 



enrichment could produce 10^-fold enrichment, 
additional rounds may be added if necessary. 



and 
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Use of different variation schemes CAn y.eid 
different binding proteins. For any given target, 
there is a larq»i plurility of proteins that will bind 
to it. Thus, if cr.o binding piotein turns out to be 
unsuitable for some reason ( e.g; too antigenic), the 
procedure can be repeated with diffeient variation 
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parameters. For exa.-nple, one might clioose differcn 
residues to vary or picV. a different nt distribution 
at variegated ccdons so that a now distribution of 
amino acids is tested at the sane residues. Even if 
the same principal set of residues is used, one might 
obtain a different SBD if the order in which one picks 
subsets to be varied is altered. 

Sec- 17.5: Other .Ttodos cf nntrtoenesis nossible: 

The modes of creating diversity in the population 
of CPs discussed herein are not the only modes 
possible. Any method of mutagenesis that preserves at 
least a large fraction of the .i n.ror-n<at ion obtained 
from one selection and then introduces other mutations 
in the same domain will vork. The limiting factors 
are the number of independent transformants that can 
be - produced and the amount uf enrichment one can 
achieve through affinity separation. Therefore the 
preferred embodiment uses a method of mutagenesis that 
focuses mutations into those residues that are most 
likely to affect the binding properties of the POD and 
arc least likely to destroy the underlying structure 
of the IPBD. 



25 Other modes of mutrigcnesis might allow other CPs 

to be considered. For example, the bacteriophage 
lambda is not a useful cloning vehicle for cassette 
mutagenesis because of the plethora of restriction 
sites. One can, however, use single-stranded-oligo- 
nt-directod mutagenesis on lambda without the need for 
unique restriction sites. tlo one has used -jingle- 
strandcd-oligo-nt-directod mutagenesis to introduce 
the high level cf diversity called for in the present' 
invention, but if it is possible, such a method would 
35 allow use of phage with large genomes. 
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Dlsnlavr:d by Mn 



BPTI-Derived Gindina Protein for HHMt: 
Phage 



Presented belov is a hypothetical example of a 
protocol for developing a new binding riolecuie derived 
from BPTI with affinity for horse hciart siyoglobin 
(KHMb) using the conr.cn col 1 bacteriophage M13 as 

genetic package. It wi.ll be understood that sor.e 
further optimization, in accordance with the teachings 
herein, may be necessary to obtain the desired results. 
Possible niodi f lea t ions in the preferred nethod arc 
discussod imnediaCely following various steps of the 
hypothetical exanple. 

By hypothesis, we set the following technical 
caoabilit ics : 



^DNA 



500 • ng/synthes is of ssDNA IGO bases 
long , 

10 ug/synthosis of ssC.'.'A 60 bases long, 
1 mg/synchec is of ssDHA 20 bases long. 

100 bases 

1 mg/1 

0.1 I for blunt-blunt, 
•J \ for sCicKy-blunt , 

11 \ for sticky-sticky. 

5 X 10^ 
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900-fold enrichment 
1 in 4 X 10^ 
10 passes 
0.05 



35 



In this example, wc will use M13 as a replicablc 
CP and BPTI as IPBO. The considerations that lead to 
these choices are discussed. In Part I, we *ire 
concerned only with getting BPTI displayed on the outer 
surface of an M13 derivative. Variable DNA may be 
introduced in the o sp-lpbd gene, but not within the 
region that codes ;for the tryps in-bind ing region of 
BPTI. Once DPTI is displayed on the M13 outer surr'ace 
of an M13 derivative, we proceed to Part II to optinize 
the affinity separation procedures. 

We consider various CPs and, for this exa:.iplo, 
choose a filamentous fcacteriopnage of col i . M13. We 
prefer phage over vegetative bacterial cells- because 
phage are nuch loss ne tabol lea I ly active. We prefer 
phage over spores because the molecular mechanises of 
th- virion fonr.ation and 3D structure of the virion are 
Duch better understood than are the corresponding 
processes of spore forr.ation and structures of spores. 

M13 is a very well studied bacteriophage, widely 
used for D;;a sequencing and as a genetic vector; it is 
a typical .T.e.-nber of the class of filamentous phages. 
The relevant facte about Ml 3 and other phoges that will 
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25 8) ant ibiocic-rcGiscance genes have been cloned 
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allow us to choose anong phages are cited in Sec. 
1.3.1. 



u 

la 

Compared to other bacteriophage » filacentous phage 
'r'il 5 in general ace attractive and Ml 3 in particular is 



especially attractive because: '-"^ 
I) the 30 structure o: the virion is known, 



10 2.* nhe processing of the coat protein is veil 

•.•rd«;rstood , 

3 ) the qencne is expandable , 
15 i ) the genome is sna 1 1 , 

5) the sequence of the genor-e is kno-*n. 



7) the phage is a sequencing vector so that 
sequencing is especially easy, and 



into the gcnor.e with predictable results (HIN'ESO) 



m 



B 

^/•^ 6) the virion is physically resistant to shear. **J 

2 0 heat, cold, guanidinium CI, low pH, and high salt, f l' ---a 

it's 



0 



Other criteria listed in Sec. 1.0 and 1.3 of the are 
also satisfied: M13 is easily cultured and stored 
30 (FRIT35), each infected cell yielding 100 to lOGO M13 
progeny after infection. M13 has no unusual or 

expensive media requiro-ents and is easily harvested '[ -^V 

and concentrated (SALie4 , FRIT85) . M13 is stable 
toward physical agents: tenperature (10\ or* phage 
35 survive 30 ninutes at 85°C) , shear (Waring blonder does 
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not kill), desiccation (not applicable), radiation (not 
applicable), age (stable for years). 

M13 is stable toward chemicals: pH (< 2.2 
5 (SMITS5)). surface active agents: not applicable, 
chactropes (guanidinion HCl = 6.0 M) , ions (no specific 
sensitivities), organic solvents (ether and other 
organic solvents are lethal (HARV7e)), proteases (not 
applicable, Htillb not a protease). M13 is not known to 
10 be sensitive to other enzyr.es. 

H13 genoce is 6423 b.p. and the sequence is known 
(SCHAV8). Because the genome is small, cassette 
autagancsis is practical on RF M13 (AUSUe7), as is 

15 singlc-strandcd oligo-nt directed mutagenesis (FRIT35). 
M13 is a plasniid and trans forr.at ion system in itself, 
and ,^n ideal sequencing vector. M13 can be grown on 
Rec- strains of coJa- The M13 genome is expandable 
(MESS7S, FRIT35). M13 confers no advantage, but 

20 doesn't lyse cells. The sequence of gene viXI is 
known, and the ar.ino acid sequence can be encoded on a 
' synthetic gene, using lilcUVS pror^oter and used in 
conjunction with the Lad"- repressor. The lacUV5 
pron^otcr is induced by IPTG. Gene VIII protein is 

25 secreted by a well studied process and is cleaved 
- between A23 and Residues IB, 21. 22, and 23 of 

gene VIII protein control cleavoge. Mature gene VIII 
protein makes up the sheath around the circular ssDNA. 
The 3D structure of fl virion is known at ir.ediun 

30 resolution; the a.-iino torr.inus of gene VIII protein is 
on surface of the virion. So fusions to M13 gene viII 
protein have been reported. TUe 20 structure of 
coat protein is inpiicit in the 3D structure. Mature 
M13 gene vni protein has only one domain. There are 

35 four minor proteins: gene HI. VI, VII, and IX. H::ich 
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of these ninor proteins is present in about 5 copies 

per virion and is related to morphogenesis or 

infection. The major coat protein is present in raore \ • 

than 2500 copies per virion. J. 

5 • , 

Although no fusions of Mi: gene VTII to ether t"*. 
* genes have been rcportci, rinowlcdge of the virion 3D V 
structure r.dkcs attachncrc of IPOD to Che araino t'p\ 
terminus of mature M13 coat protein (M13 C?) quits 
10 attractive (See Sec. 1.3.2). Should direct fusion of 
BPTI to M13 CP fail to caus#» 8FTI to be displayed on 
the surface of M13, we vill vary part of the BPTI 



sequence and/or insert short randon DNA seq^jences ^ 
between BFTI and M13 C? (Soc. 1.3.4]. 

Smith (CKIT85) and de la Cru2 ot aj^ (CkUZ28) have |; 
shown that insertions into gene I T I cause novel protein 
domains to appear or the virion outer surface. If BPTI , 

can not be made to' appear on the virion otiter surface ^ 

20 by fusing the bot i gene to the r.^ lZc:.- gene, we will fuse p. 

hpti to ■ gene t_I.I either at the site used by Smith and l ' 

by de la Cru: ot al . or to one of the temini. v;e will |. -- 
use a second, synthetic cnpy of gene III so that sor.e . 
unaltered gene III protein will be present. 

\' 

The gene Vlll protein is chosen as OSP because it t . 

is present in many copies and because its location and »" 
orientation in ttie virion are known. Note that any 
uncertainty about the azinuth of the coat protein about 

30 its own alpha helical axis is unimportant; the amino i". 

terminus ir exposed for all azimuths. f-.^ 

[■ : 

The 30 model of fl indicates strongly that fusing j;; 

BPTI to the a.ninc terminus of M13 CP is nore likely to t . 

35 yield a functional protein than any other fusion site. t 



(See Sec. 1.3.3). 



The anino-acid sequence of H13 pre-coat {SCHA78) , 
called AA_seql, is 

5 

AA_seql 

1 2 'j2 3 3 A A 5 

5 0 S 0 {}S 0 5 0 5 0 

10 MKXSLVLKASVAVATLVF:'.LSr.V.ECDOPAKAAFNSLQASATEYICYAWA 

5 6 6 7 7 

5 0 5 0 3. 
MVWIVCATIGIKLFKKFTSKAS 

15 ' 

The single-letter codes for anino acids and the codes 
for ambiguous D:/A are given in Table l. The best site 
for. .inserting a novel protein domain into M13 CP is 

20 after A23 because SP-r cleaves the precoat protein 
after A23, as indicated by the arrow. Proteins that 
can be secreted uill appear connected co rnature K13 CP 
at its amino tei-cinus. Because the amino terrinus of 
mature M13 CP is located on the outer surface cf the 

25 virion, the introduced do.T.ain uiU be displa/ed cn the 
outside of the virion. 

3PTI is chosen as IPSO of this exanple [Zcq Sec. 
2.1) because it neets or exceeds all the criteria: it 

30 is a snail, very stable protein with a well known 30 
structure. r<arks et -YU C^ARKSS) have shown that a 
fusion of tlio nhoA signal peptide gene frag.T.or.t and UNA 
coding for the nature fom of HPTI caused native 3PTr 
to appear in the pcriplasn of £^ cjJa, dcnionstrating 

35 that there is nothing in the structure of BPTI to 
prevent its being secreted. 



Marks ei ^L^ (/VvRK37) also showed that the 
structure of BPTI is stable even to the renoval of one 
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Of the cystine bridges. They did this by replacing 
both C14 and C33 with either two alanines or two 
threonines. The Cl</C^3 cystine bridge that Marks et 
aU_ renoved is the one very close to the scissile bond 
in BPTI; surprisingly/ both mutant molecules 

functioned as trypsin inhibitors. This indicates that 
BPTI is redundantly stable and so is likely to fold 
into approximately the sane s'^ructure despite nunerous 
surface mutations. L'sing the knowledge of homologues, 
ilide InXrS/ we can infer which residues must not be 
varied if the basic BPTI structure is to be maintained. 



The 3D structure of BPTI has been determined at 
high resolution by X-ray diffraction (KUSET?, r<APQ33^ 
15 WLOD34, WLO037a, WL0D37b), neutron diffraction 
(WLODa4), and by IHiR (WAG:;87). In one of the X-ray 
structures deposited in the Brookhaven Protein Data 
Bank, "GFTT", there vas no e lectron • dens i ty for A53, 
indicating that A53 has no uniquely defined 
conformation. Thus we know that the carboxy group does 
not make .tny essential interaction in the folded 
structure. The a.T.ino tenninus of BPTI is very near to 
the carboxy teminus. Goldenberg and Creighton 
reported on circularized BPTI a.-.d circularJy pernuted 
25 'BPTI (GOLD83). Sorie proteins homologous to BPTI have 
more or fewer residues at either terminus. 

BPTI has been called "the hydrogen atom of protein 
folding" and has been the subject of nunprous 
30 experimental and theoretical studies (STAT37, SCKW87, 
COLP33, CHA283), 
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BPTI has the added advantage that at legist 
homologous proteins are known, as shown in Table 13. 
tally of ionizable griups is shown in Table 14 and 
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cor.positc of anino ac/.J types occurring at each resid.jc 
is sho-n in Table 15. 

BPTI is freely soluble and is not known to bind 
metal ions. OPTI has no known enzymatic activity. 
BPTI binds to trypsin, =» 6.0 x 10"^** H (TSCH87). 

BPTI is not toxic. If K15 of BPTI is choingcd to L, 
there is no .-neasurable binding between the rutant em 
and trypsin (TSCHS?). 

Stereo Figure 7 shovs the alpha carbons of BPTI 
plus the side groups of conser-/ed residues; all four 
ato.Tis of conserved glycines are shown. All of the 
conserved ror.idxics are buried; of thti seven fully 
15 conserved residues only C37 has noticeable exposure. 
The solvent accessibility of each residue in BPTI is 
given in Table 16 which was calculated from the entry 
"ePTI" in the Brookhaven Protein Data Bank with a 
solvent radius of \,a a, the atonic radii given in 
Table 7, and the r.othcd of r^e and Richards (Li:ED7n. 
Each of the 51 ncn-ccnservcd residues can acconncdate 
two or more kir.ds of ar.ino acids. By independently 
substituting at each residue only those amino acids 
already observed at that residue, wc could obtain 
25 approximately 7 x 10-^2 different nnino acid sequences, 
nost of which wiM fold into structures very similar to 
.BPTI. 
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BPTI will bo useful as a XPUD for nacrono I ecu Ics . 
(See Sec. 2.1.1) am and BPTI homolcgues bind tightly 
and with hit;h specificity to a number of enzymes. 

BPTI is strongly positively charged except ac ver,* 
high cii. tlius OPTI is ui;eful as IPDO for targets th.ic 
are not also strongly positive under the conditions of 
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intended use (see Sec. 2.1.2). There exist homologues 
of BfTI, however, having quite different charges (viz. 
SCI-III from Bc-bvx rori at -7 and the trypsin 
inhibitor froia bovine colostrum at -l) . Once a 
derivative of M13 is found that displays BPTI on its 
surface, the sequence of the BPTI docain can be 
replaced by one of the hoDologous sequences to produce 
acidic or neutral IPBOs. 



10 BPTI is not an enzyne (See Sec. 2.1.3). BPTI is 

quite sr.ail; if this should cause a pharraacological 
problea, tvo or noro BPTI-derived domains riay be joined 
as in the human BPTI hoT.ologue that has two domains. 



15 A derivative of M13 is the preferred OCV. (See 

Sec. 3) . Wild-type M13 does net confer any resistances 
on infected cells; M13 is a pure parasite. A 
"phagenid" is a hybrid bcu'-reen a phage and a plasmid, 
and is used, in this invention. Double-stranded plasmid 
20 DNA isolated from ph»-\qer.id-bear ing cells is denoted by 
the standard convention, e . c. pXY24. Phage prepared 
from these cells would be designated VV2-;. Phagenid:; 
such as aiuescript K/S (sold by Stratagene) are not 
suitable for our purposes because Bluescript does not 
2 5 contain the full genome of Ml 3 and must be rescued by 
coinfection with competent wild- type Ml 3. Such 
coinfcctions will likely lead to genetic recombination 
yielding heterogeneous phage . unsuitable for the 
purposes of the present invention. 

30 

It is also well known that plasmids containing the 
ColEl origin of replication can bo greatly amplified if 
protein synthesis is halted in .: log-phase culture. 
Protein synthesis can be halted by addition of 
35 chlorarphcnicol or other agents (MAra82). 
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The bacteriophaqe H13 bU 61 (ATCC 37039) is 
derived from wild-rcype M13 through the insertion of the 
beta lact.inase gene (HIKESOJ . This phage contains 3.13 
kh of D::a. M13 bla cat 1 (ATCC 37040) is derived fron 
M13 bla Gl through the additional insertion of the 
chloramphenicol resistance gene (HINEaC) ; K13 bla cat 1 
contains 9.S8 of DNA. A 1 though - ne ither of these 

variants of M13 contains the ColEl origin of 
replication, either could be used as a starting point 
to construct a usable cloning vector for the present 
3le. 



The OCV for the current eN;a.T.ple is constructed by 
15 a process i 1 lustrated ■ in Figure 8. A brief description 
of ail the plasmids and phagenids constructed for this 
Example is found in Table 17. 



For ss oligo-nt site-directed mutagenesis, 
2 0 ir.ultiplc pririiors lead to higher efficiency. Three r.on- 
rcutagcnic primers are used : 



5'C232ri) GCC CGC TCT GAG GGT CCC GOT (2352) 3' wt.Mi3 
2 5 3.' ccg ccg aga etc ccn ccg czi 5' oliq.= 2 4 , 



'5' (485-;) CCT CCT GCC TCT CAG CCC CGC (4875) 3 ' vt.'U3 
3 0 '3' eg.", cga ccg aga gtc gcg vcg 5' olig = 25 , 



35 



and 



5' (34 51) CCC GTC AGC GTG GGT CTC CCG ( 34 3 1 ) 3' 

3' ggc cac teg cac cca gag cgc 5' oliqr26 



40 01ig?24 is compler.cntary to a segr.ent ■ near the end of 
M13 gene III and olig = 25 is ccrplerr.entary to part of 




gene IV (SCHA73) . 01iq?26 is pare of the jno^ gene 
from pBRj22 {MANI82, Appendix D) ; the nunbers shovn 
refer to pBR.122 base pair nunbers. Note that pLC2 and 
its derivatives carry the anti-sense strand of the arg*^ 
gene in the + DNA strand. The segments are picked to 
be high in CC content and to divide the pLC7. genoize 
inro several scgnents of approximately equal length. 



are assigned from an agreed origin and in ascending 
order in the S'-to-3' direction of the viral + strand. 
Conventionally, this DNA is dravn with the 5'-to-3' 



The genetic engineering procedures needed to 
construct the OCV are sianaard. All restriction 
digosto use commercially available enzynes and are 
carried out under conditions recommended by the 
supplier. All restriction fragments 'f DNA are 

purified by HPLC or electrophoresis from agarose gels 
as described elsewhere in the present invention. 
Conpetenc ^ col i are preferably prepared by a modified 
version of t.ie procedure of Maniatis (^'-A^^I32) given in K 
the. generic detail section. M13 and its engineered ^.T 
derivatives are infected into £^ col i strain PE3£4 ^t. 
( F* , Rcz" , Sup"*" , Anp^) . Plasnid D!;a of M13 derivatives is t- 
transferred into co 1 i strain PI 3 3 3 ( f" , Rec~ t^'^ 

. Sup*" , Ar.p^ ) so that we avoid multiple infections that ^-y 
might arise once phage are produced. Isolation of MIj [v? 
phage is by the procedure of Salivar et a I . (SALIo*;); - . 

isolation of riplicative form (RF) M13 is by the j'. 
procedure of Jazwinski et al . (JA2W7ja and JA::vr7 jb) . 
Isolation of plasmids containing the Ccl£l origin of 
replication is by the method of !-!aniatis (MA:.':a2; . t-.-.* 

DMA sequencing is by the method of Sanger 
(A'JSU37). Virions of M13 derivatives contain circular f;.-: 
ss DNA that is called the viral + strand. Dase numbers r«- 
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direction clock'^ise and corresponding to increasing T- . ' 

base nunber. In relation to the gcnor.e3 of .^13 t > 

derivatives, we will use "up" or "above" to nean higher 
base nur.ber or further along in clockwise direction. f-f 
5 sL-ailarly "down" and "below" will mean lower base J 
nu^nber or further along in the counterclockwise 
direction. To deterr.ine the base sequence of part of 
an .M13 derivative, one needs a sequencing princr that 
is conpleneiitary to a region above and within about 100 
10 bases of the region to be sequenced. Because the OCV ^''-^A 
IS constructed fron parts of M13ir.pl8, parts of pSR3 22 , fi'- '-j 

and synthetic ON'A, the sequence of flanking regions is 
-Iw^yc known. ^ 

? •*■■' ."i 

^5 ^'e pick the aro*^ gene fron pBR322 as a convenient ^' * 

antibiotic resistance gene. Another resistar.ce gene f*-" 
such as kananycin, could be used. (The New England f /V 

BioLabs 1983/39 catalogue contains a genetic nap of r " 

pBR322 on page 106 . J The plasraid pBP.322 also contains 

20 the ColEl origin of replication. The restriction sites 
Acc I at 22-;6 and Aat ir at 4286 are the most 
convenient places to cut p3P.322 to obtain both an r 
intact ar.3^ gene and the CclEl origin of replication ( 
with ends suitable for ligation to other D.'.'A. I 
25 \ 

The plas.-=id pQP.3 22 contains a unique Alwrt r site ^ 

at base 2386 that is between the a-no ^ gene and cr i . f 

There is a unique Aj^wU r sito in M13nipl3 at base 2137. [ 

When the Acc I-to-AV^ ir fragment of pDR322 is iigatcd [ 

30 into Mljr.pia, there will be two AlwN I sites and no 5. 

easy way to excise the ano^ gene. Thus ve convert the 7\ 
Alvfl I site of pBR322 into an .Vba I site that will ce 

r 

unique in all the D.'.'A constructs of the present f 

example. The two oligo-nts: r. 



i': 



I-- 




O a ■ i 

I 

.136 j 
5' ccqiTCTAGAot^igtcqCCA 3' olig = 60 

3' CGTcgctAGATCTqoccagc 5' oliq:6l C 

are synthesized by standard r.ethods and annealed. T^.e f 

A Lw.V I site at base 2336 in pSS322 has the seqi:cnce 5'- [{ 

CACCCACTC - 3'. Plasnid pBR322 is cut uith Alvff r and t 
nixed with the synthetic ds D:/A and liq.-ited. Cells -ire 

IC transeorr.ed ard coloctod for tetracycline resistance. -I 

Tetracycline resistant colonics are screened for tr.e Ir 

correct insert by restriction dicestion vith y.ba I t.-.at j. 
cuts the correct construction but net p3R322. The 
correct construction is called pLG322 . Plas=id pLC:22 

15 differs from pDR3 2 2 only by the replacement o: trie j-i. 
Alu-n r site with an Xtj r site. 

i: . 
k- 

The plasmid pLC332 contains a second . A'c Z ^ 

restriction site at base 651 so that diqestion -jf f--- 

20 pLG322 with A=xt 11 and Acc I vields ■ three fraqnonts, 

cr.e of about 20'.1 bases (that ve vant) , che cf -■'.bo-jt r - 

729 bases, and one c: about IcOO bases. To facilic-ice *C\ 

I* . 

isolation of the 20-; 1-base fracnent, ve also di-^est [■ 

!•' 

pLG3 22 with I that cuts at base 13'69. The Stv I 

2 5 cut reduces the 1600 -base fraa-er.t to tvo rrianents cf i.- 

about 700 and about 9'JO bases each. We purify the !;■ 
2 0 4 1 - n t fragment by K ? L C or a q a r o s e gel ?; ■ 

-electrcphoresis . f 

f.- 

30 M13npl8, sold by Ucw England BioLabs. contains |. . 

neither Aa t II nor Acz I "sites. Therefore *-e insert an 
adaptor that allo;;3 us to insert the Aat Ii-to-Art I "* 
fragr.cnt of pLG3 2 2 that carries the r^Q^ gene and the 
CclEl origin of replication into a desirable place in 

3 5 .^13npl3. Mljr.pl3 contains a lncL"/5 pror.cter and a l^.cZ jV- 

gene that are not ur.eful to the p-jrooses of the u resent r'- 
invention. Ey cutting M13r.plB with Ava 11 at the ^; 

I 
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unique site at 5914 and with Dsu36 I at the unique site 
at 6508 and discarding the approx ioa tely 600 
intervening base pairs, ue eliminate all recognition 
sites of the enzynes shown in Table 18 fron M13npl8. 

M13r:pl8 itself is not cut by the enzynes listed in 
Table 19. Anong the enzymes in Tables 18 and 19, those 
listed in Table 20 have recognition sites' within the 
Acc I-to-^iit II fragnent of pU;322 that contains the 
anp^ gene and the ColEl origin of replication. 

Therefore the following adaptor is synthesized, 

5' GACCCACCTCCgcctcGTATACCCCACCCcatagctCC 3' oligiil 
3' CCTCCAGacggagCATATCCCCTCCCgcatcgaCCACT 5' oligS2 
Aval^tAatlll lAccIfP.srri I [Bsu36r 
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where the ^va II and ^at II sites share one CC base 
pair, and the Acc I and Rsr II sites share a different 
CC pair. The two.33-base oligc-nts are synthesized by 
a standard procedure described elsewhere in the present 
invention; the oligo-nts are annealed to oach other. 
The b<»=es shown in lower case are spacers. In a later 
step, we will cut this adaptor with both Aa_t II and 
Acc I; for both enzymes to cut efficiently, there nust 
. be at least five bas between the sites. Similarly, 
we will begin the construction of the obd gene by 
inserting DMA at the II and Bsuie I sites; thus 

these sites are separated by seven bases to allow 
simultaneous cuts. 

The annealed nrt.iptor is 1 ig.ited with RF M13n:pia 
that has been cut with both Avo ii and 9su36 1 and 
purified by UPLC or po lyocry lac iie gel electrophoresis 
(PACE). Cells are transfornod with the ligotcd DMA. 
DNA fron colonies selected on. La agar with anpicillin 
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is screened by restriction digestion. The desired 
construction can be cut with Rsr II or Acc I, but not 
by any of the , enzymes listed in Table 18. Plasmid DMA 
from colonies that have the predicted restriction 
5 digestion is sequenced in the region of the insert to t */ 

verify the construction. This construction retains 
both the Ava II and the Bsu3 6 I sites. The resulting 

construct is called pLGl. f;. .--^ 

10 The plasmid pLGl is grown by standard techniques ^' 

and DNA isolated and cut with both Aat II and Acc I. ^^g^ 

After ligation, there will still be Aa^ II and Acc I V:'>^; 

restriction sites at the ends of the inserted DN'A. The p'^'. 
Aat ri-to-Acc I fragment of pQR322 is ligated to the 

15 backbone of LGl. The ligated D:/A is u^ed to transform t':- : 

competent ^ col i that are plated on a.-picillin- V. 

containing plates after a short grow-out. t':.*..-- 



Ampicillin-resistant colonies are picKed. Plasmid L 

20 DMA of the phagemid from the resistant colonies are ^'tZ 
digested wich Bru36 I and Psr t. To verify ~ the 

construction, DMA from phageraids with the correct ^ • 

restriction digestion pattern is sequenced: a) from f ■ 

about 20 bases above the Bsu3^ I site to about 20 bases \\: 

25 below the Rsr I site, and b) for about 30 bases either y ' ■ ' 
'side of the unique Ava II site. The correct construct 

is named pLG2 . f. ' 

The ctcc I restriction site is no longer needed for . 

30 vector construction. To eliminate this site, RF pLG2 - ^-vV- 

dsDNA is cut with Acc I, treated with Klcnow fragment 

and dATP and dTTP to make it blunt and then re ligated. ^ 

The ligated ONA is used to transform competent cells; |, /ii 

after a short grow-out, ar.pici 1 1 in-rcs istant colonies [•"*"': 

35 are selected. Restriction digestion is used to screen iM'-^i 
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phatjecid ON'A from these colonies; the desired product 
cannot be cut with Acc r. To verify the construction, 
DNA Cron colonies lacking an /vcc I restriction site is 
sequenced froa about 20 bases above, the fomer Acc I 
5 site to about 20 bases below it. The cloning vector, 
named pLC:, is now ready for stepwise insertion of the 
osp-iobd gene. ' 

We are now reaoy to design a gene (See -Sec. A) 
10 that will cause EiPTI -dona ins to appear on the outer 
surface of an M13 derivative: LG7. 

To obtain a novel protein donain attached to the 
outside of X13, we insert DNA that codes Cor mature 
OPTI after A2: of the precoat protein of M13. Mature 
BPTI begins wich an arginine residue, which i«; charged; 
cleavage by signal peptidase I is normal in such cases. 
Signal peptidase I (SP-I) cuts a chimera of M13 coat 
protein ant BPTI after A23 leaving nature BPTI attached 
at its carbsxy end to the amino teminus of H13 CP. 

The following anino-acid sequence, called AA_3eq2, 
is constructed, by inserting the seque^ncc for mature 
BPTI (shown underscored) innediately after the signal 
sequence of M13 preccat protein (indicated by the 
arrow) and before the sequence for the M13 CP. 
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10 CLC0TFVYCGCRAKRr:?tFKSAFDC"RTCCGA /t£C3DPAKAAf NSLOASAT t*' 

r.-; 
t-. 

10 11 11 12 12 13 pi 
5 0 5 0 5 0 -h- 

15 EYIGYAWA>(\*\r/IVGATICIKLFKKrTSKAS 



We adopt the convonti-jn that sequence nur.bers of 
20 fusion proteins refer to the fusion, as coded, unless 
otherwise noted. Thus the alanine that begins M13 CP 
is referred to as "nur.ber 3 2", "nur.bcr 1 of CP'\ or 

"number 59 of the mature 3PTt-Ml3 CP fusion''. 



I 

I' 

25 The osc-i nbd ' gene is recjulatcd by the lacuv: r 

pronoter, .«;o that the level of expression can be L 
regulated by the concerttra t ion of IPTC supplied in the 
growth medium. (See Sec. 4.1), The host :;troin of 
col i should harbor the i c I gene that represses the 

30 lacUVS promoter to a greater extent than Ul;!!'^- The 
^ CGp- iobri gene is ended by the t ro attenuator so that 
RNA polymerase will not road through into subsequent 
genes. The osd- t obd gene is expressed and processed in 
parallel with the wild-type gene V T 1 1 . The novel ^ 

35 protein, that consists of B?'TI tethered to a M13 CP 

domain, constitutes only a traction of the coat. \, \ 

Affinity separation is able to separate phage carrying ^; 
only five or six copies of a rioiecule that has high 
affinity for an affinity m^»trix (SMlTeS) ; W j: ' 

40 incorporation of the chir.eric protein results in about p\ 
30 copies of the protein ex'posed on the surface. If yX" 
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this is insufficient, additional copies may be 
provided. 



Figure 9 shows» in stereo, a hypochetical model of V 



m 



a short segment of the coat of a derivative of M13 ir. 
which sotne coat protein ncnor.ers are fusions of nature 
BPTI to the amino terr.inus of the ncrcal M13 C?. The r 
figure shows only protein C^ipiiji;; the DNA, rot shown, K 
lies inside the cylinder. The rxdel of H13 coat is 



m 

10 after the r.odol for fl of Mar/in and colleagues r^'i 
(BAIJN81J . The BPTI domain is taken from the Brookhaven p-^? 
Protein Data Bank entry "GPTI** and was attached by T i^' 

standard model building cethcds that insure that 
covalent bond lengths dnd anTlec are close to 

IS acceptable values. The space bef-cer. the alpha helical 'l. 
main chains is filled by protein side groups so that ^ ' 

the DNA is protected fro3 solvent. The figure is not 

meant to suggest that BPTI fused to H12 CP will adopt \ 
the conformation shown, which is arbitrary. Rather the 
20 model shews that the fusion protein could fit Into the • JT- 

supramo locu la r structure in a s te rccchomica 1 ly 
acceptable fashion without disturbing the internal 
structure of either the M13 CP or BPTI domain. 

25 ^ The osn- inbd gene will use: a) the lacUVS 

promoter, b) a Shine-Da 1 go rno sequence having high 
homology to natural Sh ine-ria Iqa rno sequences, c) a 
completely synthetic coding region having codons 
assigned to optimize placement of restriction sites, 

30 and d )• the trp attenuator as transcriptional 
terminator. (See Sees. 4.1 and 4.2). 

The ambiguous DNA sequence coding for AA_seq2, 
shown in Table 3, is examined by PROSPECT for places 
35 where recognition sites for any of the enzymes listed 



in Table 21 could be created without filtering the 
amino-acid sequence. (See Sec. 4.3). A master table 
of enzymes is' conpilcd from the cataLoques of enryne 
suppliers l:i>ted in Table 4. The enzymes listed in 
S Table 21 are those that do not cut ' the OCV, the 
construction of which is described above. The codes 
used in the anbiguous DNA are shewn in Table 1. 

Using the procedures given in Sec. 4.3, we design a 
10 Lpbd gene, such as that shown in Table 22 and in Table 
23. The recognition sequences of commercially 

available enzyncs that recognize five or =>ore bases are 
shown in Table 4. Some of these enzynos f c .o. 3a n I or 
Hph I) cut the OCV too often to bo of value. A sur..T.ary 
15 of restriction sites in rhe designed cbd gene are given 
in Table 24 . 

The entire DMA sequence of th2 r. I3cp- ' b:>*: i fusion 
vith annotation appears in Tf»blc 25 showing che useful 
20 restriction sites and biologically ir.portant fcitures, 
•/ i 7. . the lo'j'.'VS promotcir, the 1 acQ operatior, the Shine- 
Dalgarno sequc-nce, the ani.no acid sequence, the stop 
codons, and the .transcriptional terr.inator. 

25 - The i pbd gene is synthesized in several steps 

using the method described in Sec, 5.1< generacing ds 
C:iA fragments of 150 to 190 base pairs. In this 
exanple, the 3' overlap window {U.y, is set to run frotn 
23 to 27 which is generous. The end spacers (rig) that 

30 are added to insure efficient digestion are set to 8, 
which is also generous. Syntheses designed with, 
sxaller overlaps and shorter spacers would allow longer 
fragments of dsDUA to bo synthesized and con.«;une less 
of the reagents. Note, however, that Oliphant and 

35 Struhl (OLIP37; required large excesses of restriction 
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encyr.es aeant to cue ne^r t.>.e ends oi their dsONA; this 
could have been because they had set Ns=2. 

All DNA synthesis and purification is done by 
5 standard nethods as described in Sec. 5.2. 

The four steps (See Sec. 6.1) by which we clone 
synthetic fragnients of t.^e nncp-bc-ti gene (the oso- 
jp'bd gene of the present exanple) into pLG3 and its 
10 derivatives are illustrate:! in Figure 20, 

The sequence to be introduced into pLC3 is shown 
in Table 26 and in Table 2". The segment is 158 bases 
long and is synthesized from two shorter synthetic 



Sec, 5.1 of the generic 
features of this segment 



15 oligo-nts as described 

specification. The inport::nt 
are five restriction sites, the l.^cU\'5 promoter, a 
Shine-Dalgarno site, and c^.e TrpA attenuator as shown 
in Table 26. 

20 

Table 2 7 repeats the ant i -sense strand shown in 
Table 26. The 99 bdso fr.^crrr.ent shown in upper case 
letters and underscored ( 5 ' -CCGTCC. . . .CCTTCG-3 ' = 
olig»3) . is synthesized in the standard manner, 

25 Similarly, the 100 base Icng fragncnt of the sense 
strand shown in lower ciiie ( 5 ' -cgctca . . . . aattg-3 ' ~ 
^olig-4) is synthesized. After annealing, the double- 
stranded region is ex tended with Klenow fragnent by the 
procedure given obove to -^y:Q the entire 176 bases 

30 double stranded. The overlap region is 23 base pairs 
long and contains. 1-; CC p.-». :rs and 9 AT pairs. The DNA 
between Avr II and .'Xr.u II d.-.cs not code for anything in 
the final pbd gene; it is -.^cre so that the DMA can be 
cut by both Avr It and Ai--.: ;i at the same tine in the 

3 5 next step. This spacer was nade rich in C and C so 
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that annealing of the two s in^ 1 c-s tranded ONA fraqr.ents 
will be efficient. Eight bases have been added to tho 
left of Rsr II and nine bases have been added to the. 
left of 5a u I (same, speciiicity and cutting pattern as 
Bsu36 I) . These bases at the ends are not part of the 
final product; they nust bo present so that the 
restriction enzymes can bind and cut trie synthetic DNA 
to produce specific sticky ends. 

The synthetic Dr.'A is cut with both Hau I and 
Rsr II and purified by HPLC or PACE. RF pLG3 is cut 
with Sau I and ^va II and purified by KPLC or agarose 
gel electrophoresis and . electroelut ion. The large 
piece .from the phagemid and the synthetic DWA are 
ligated and used to transform E_:. col i . Anp ici 11 in- 
resistant colonics are obtained and plasnids are 
screened by endonuclease digestion of RF phagenid DMA. 
The desired product can be cut by Avr II, Asu II, or 
BstE II, but the original phagenid can noc be cut by 
any of ''hese enz^-^-cs. To verify the. insert; QUA fron 
isolates that have the correct restriction sites ii 
sequenced from about 10 bases above the Sau I site to 
about 10 bases bclcw the Rsr II site. The construct 
with the correct insert is called pLG4 . 

The second step of the construction of the pCV is 
Illustrated in Tables 28 and 20. This second segment 
of DNA is 155 bases long. As in the construction of 
pLC4, two pieces of single-stranded DNA are 
synthesized. A 99 base long fragment of the anti-sense 

sr.rand (5'-CCACCA CCTCCC-3' = olig?5) is shown in 

upper case letters and underscored; the other piece of 

99 bases (5'-gatcta atcacct-3' ^ olig = 6) is shown 

in lower case and is a fragr.cnt of the sense strand. 
These ctrands arc conplcnentary over 24 bases, 
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containiiiq 14 CO bai;e pairs and 10 AT base pairs. 
Klenov (ragnent is us.id to occend in both directions to 
produce els Or^A. Doth the synthetic dsDMA and RF pLX;-; 
OUk are cut with both Avr II and Asu II and purified by 
HPLC or the appropriate type of gel electrophoresis. 
The bacJ:t.one froa the phage.-:id pLX:4 and the synthetic 
ONA are liqated and used to transform col i . 

Ampicii 1 in-rcs istont colonies arc obtained and plasnids 
are screened by restriction digestion. The desired 
product can be cut by any of Af I II, f.'he I, Nru I, Kr>n 
1/ Acc III, Ava I, Xho I, Pf !M I, A^a I, Dra II, £ss I, 
or BssH I while pLC4 can not be cut by any of these 
enz/uies. To verify the insert, Dr<A froiu piictgcmiUs wti:;i 
the correct restriction sites is sequenced from about 
10 bases above the Ss tF: II site to about. 10 bases belcv 
the Avr II site. The construct carrying this second 
insert is called pLCS. 

Construction of pLC6 proceeds similarly to the 
construction of pLC5. The sequences are shown in 
Tables 30 and 31. The f-o single stranded segments 
(oligs7 and olig:3) ^re synthesized, annealed, dnd 
extended with Klenov fragr.ent. The overlap region 
cor.prises 25 base pairs, 15 CG and 10 AT. Doth the 
synthetic D:IA and pL05 are cut with both Rsi:H I and 
Asu II, purified, and the appropriate pieces are 
iigated and used to transform col i . Anpicillin- 

resistant colonies are obtained and plasmids are 
screened by restriction digestion. The desired 
phage=iid can be cut with any of Stu I, Acc I, Xca I, 
£sB I. ZZ^ III, OlL^i I. Bfce I. or Jfar I, while pLG5 can 
not be cut by i!ny of these enzynes. To verify the 
third insert, D:;a fron phagc.iids with the correct 
restriction nap is sequenced fro.Ti about 10 bases above 
Asu II site to about 10 bases below the DssH I site. 
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The construct with the ccrrect third insert is called 
pLC6. 

The construction of pLX:? is illustrated in Tables 
5 32 and 33 and proceeds si.-:;ilariy to the constructions 
of plCA , pLG5, and pLCo. The two single stranded 
segments (olig=9 and oiig^lO) are synthesized, 
annealed, and extended with Klenow fragrsent. Both the 
synthetic DNA and pLG6 ore cut with both Dbe I and 

1-0 Asu 11, purified, and the appropriate pieces are 
ligated and used to transfcm col i . Anpicill in- 

resistant colonies are screened by restriction 
digestion of phagenid R" D:/A. The desired phagemid can 
be cut with any cf S_f± I, Itind III, MUj I, BstX I, or 
15 Nco I, while pLC6 con be cut by none of those enzyr.es. 
To verify the fourth insert, DNA fron pnagemids wich 
the correct restriction sites is seq^jenced fron about 
10 bases above the A^u II site to about 10 bases below 
the 3be I site. The construct vith the correct fourth 
20 insert is called plCT: the display ot QPTZ on the c-ter 
surface of LC7 is verified by the nethods cf Sec. H. 

M13an4 29 is an ar.ber r:utation of Ml 3 used to 
reduce non-specific b incline by the affinity n'.atrix for 
2 5 phages derived fron 1:13. r^l3.Tr;-129 is derived by 
standard genetic r.cthcds (:::lL72) fron vtMlj. MI3.v;:429 
'is grown on col 1 :;train r£3£3(F"^, CupE, ?.ec~, .k.T.c^) 
and harvested by the st2nd.=ird r.ethod. 

30 Phage LG7 is grown on col i strain PE38A in LB 

broth with various concentrations of I?TC added to the 
ir.ediua to induce t.*^e oso- isbd gene. Phage LC7 is 
obtained from colls grovn with 0.0, 0.1, l.o, 10.0 or 
100.0 uy., or l.C r..". I PTC, harvested (.See Sec. 7) by t.^e 

35 method of Salivar (JALIG^), and concentrated to obtain 
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titre o: iO^^ ptu/nl ty the uiethod of Kcisir.q 



i^/^ . (ME3S83) . 

The preferred zcthod of determining vhcther LG7 
5 displays BPTl on its surface (See Sec. G) is to 
M determine . vhcther these phage con retain a labeled 

derivative of trypsin Ctrp) or anhydrot rypsin (AMTrp) 
^3 on a filter that allc-s passage of unbound trp cr 

t-^ AHTrp. Trypsin contains 10 tyrosine residues and can 

io be iodinated with ^'^l by standard methods; ve denote 
the labeled trypsin as "trp*". Labeled ar.hydrotryps in 
^ is denoted a= "AHTrp*"- Other types of labels can be 

M used on trp or AHTrp. e^ biotin or a fluorescent 

i| label. AHTrp* or trp* is labeled to an activity of O.J 

15 uCi/ug. A sample of 10^2 1^7(10 nuM IPTC) is nixed wit.^ 
1.0 ug of trp* or AHTrp* in 1 . 0 ml of a buffer of 10 rJi 
KCl, adjusted to pH S.O vith 1 mM KjHPO^ / VZi2?04 . The 
mixture is. passed t-ough an Anicon Mc?l system fitted 
with a r.e=:brane filter that allows passage o: proteir.s 
20 srnaller that M, - -CO, 000. Filters ar^ soared in 
buffer containing tr? or AHTrp prior to the analysis. 
tM The filter is washed tvice with 0.5 o^ buffer 

containing trp or A-Trp. The radioactivity retained on 
the filt..c is quantitated with a scintillation counter 
25 or other suitable device. If each virion displays cr.e 
dopy of BPTI, then .05 ug of protein can be bound t^.a- 
wouLd give rise to 3 x 10^ disintegrations / nir.utc on 
the filter. 

.30 An alternative vay to quantitate display of D?TI 

on the surface of LC7 is to use the stoichiometric 
binding between trypsin and BPTI.to titrate the BFTI. 
A solution that titers lO^^ pfu/ml of a phage is 
approx-inately 1.6 k lO'S M in phage if e^ch virion is 
35 infective. The ratio of pfu to total phace can be 
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detcrrained spcctroph.ototnGtr ical ly usino the nolar 
extinction coefficients at 260 nra and 280 nai corrected 
for the increased length of LG7 as compared to wtM13. 
For example, if a 1.0 ml solution that contains 10^^ 
pfu of LC7 phage grown with 1.0 nM I?TC inhibits 
trypsin solutions up to 4.3 x 10"^ M, we calculate that 
there are approximately 30 DPTIs/GP f i . e. ( *. . 8 x 10"^ 
molecules of DPTI/1)/(1.6 x 10"^ phage/1)). - Inhibition 
of a specified concentration of trypsin is nost easily 
measured spectrophotonetrica 1 ly using a pcpt ide-l inked 
dye, such as Naip^a-benzoy 1-Arg-Nan (TSCH:7). 



Alternatively, binding to an affinity column nay 
be used to demonstrate the presence of BPTI on the 

15 surface of phage LC7 . An affinity colunn o: 2.0 nl 
total volume having BioRad Affi-Cel 10 f"^* .riatrix and 
30 ng of AHTrp as affinity material is prepared by the 
method of QioRad. The void volume (Vy) of this cclur.n 
is, by hypothesis,. 1.0 ml. This affinity column is 

20 denoted (AHTrp). 

A sar.ple of 10^- ^M13£iri429 is applied to (AHTr?) in 
1.0 rnl cf 10 m:i KCl buffered to pH 6.0 with KH;;?0; / 
K2HPO4. The column is then washed with the sar.e buffer 
25 until the optical density at 230 nn of the effluent 
'returns to base line or 4 x Vt/ have been passed through 
the column, whichever comes first. Samples of LG7 or 
LCIO are then applied to the bloc)<ed {AKTrpl colur.n at 
10^2 pfu/nil in 1.0 nl of the same buffer. The colurin 
30 is then washed again with the same buffer until the 
optical density at 2S0 nm of the effluent returns to 
base line or 4 x Vy have been passed through, whichever 
cones first. Following this wash, a gradient of KCl 
from 10 r^l to 2 H in 3 x Vy, buffered to pH 8.0 with 
phosphate is passed over the column. The first KCl 
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gradienc is foi loved by a KCl gradient running L'ron 2 H 
to 5 M in 3 X Vy. The ccccr.d KCL graciient is fol loved 
by a gradient of guanidiniua CI from 0.0 H to 2.0 M in 
2 X Vw in 5 M KCl and buffered to pH 8.0 with 

5 phosphate. Fractions of 50 ul are collected and 
assayed for phage by plating 4 ul of each fraction at 
suitable dilutions on sensitive cells. Retention of 
phage on the colur.n is indicated by *ippearancc of LG7 
phage in fractions that clute significantly later froa 

0 the colunn than control phage LGIO or wtM13. A 
successful isolate of LC7 that displays BPTI is 
idt=ntified, the bpt 1 insert and junctions arc 
sequenced, and this isolate is used for further vork 
described below. ' It is likely that a significant 

5 fraction of clonal isolt-.es fron the sane ligation that 
are characterized an identical by restriction digestion 
will siailariy display BPTI. 

If vgDMA is used .to obtain a functional fusion 

0 between a cFFI tnutant and M13 C? f vide i nfra ) . then DNa 
from a clonal isolate is sequenced in the regions that 
were variegated. Then gratuitous restriction sites for 
useful restriction onzyr-c-s are renoved if possible by 
silent ccdon changes as follows. A de novo piece of 

5 synthetic DMA is synthesized such that the r.clcczp.d 
amino acid sequence is preserved and clcned into pLC7. 
The sequence nurr-bcrs of residues in OSP-IPBD will be 
changed by any insertions; hereinafter, we will, 
however* denote residues inserted after residue 23 as 

0 23a, 23b. etc. Insertions after residue 81 will be 
denoted as 81a, 31b, etc. This preserves the nu.-nbering 
of residues between C5 in BPTI and CSS in BPTI . 
Residue CS of BPTI is always denoted as 2S in the 
fusion; residue CSS of BPTI is always denoted as 78 in 

5 the fusion, and the intervening residues have constant 
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nunbers . 



Should 'LG7 phage fro.*n cells grown wich 10 nH IPTG 
fail to display BPTI on its surface, we have several 
5 options. We night try to deternine vhy the 
construction failed to wory: as expected. there are 
various possible nodes of failure, including : a) BPTI 
is not cleaved fron the Ml 3 r. iqnal sequence, b) B'PTl is 
cleaved from the Mi3.C?, and c) the chimeric protein is 

IQ Dade and cleaved, after the signal sequence, but the 
processed protein is not incorporated into the M13 
coat. BPTI has been secreted from col i (MARK86) ; 
however the MI3 coat-protein signal sequence was not 
used. Therefore problons stcr.ming fron the signal 

15 sequence are unlikely, but possible. We could 
deterrninc --hether BPTI was present in the periplasm or 
bound to the inner ner.brane of LC7-infected cells by 
assays usir.g labeled trypsin or anhydrot ryps in . 

20 Proteins in the periplasn can be freed through 

spheroplast fomaticn using lyso::ynie an:l EDTA in a 
concentrated sucrose solution t BIRD67 . • M.\L.'.C4 ) . If 
BPTI were free in the p'-; r ipiasn . it would be found in 
the supern3tant._ Trypsin labeled with ^^^I would be 

2 5 nix'ed with supernatant and passed over a non- 
denaturing nolecular sizing colurin and the radioactive 
'fractions collected. The radioactive fractions would 
then be analyzed by SD3-PACE and examined for BPTI- 
sized bands by silver staining. 

30 

Spheroplast forr.ation exposes proteins anchored in 
the inner .T.o.nbrane. Spheroplasts would be mixed with 
AHTrp* and then either filtered or centrifuged to 
separate then from unbound AHTrp*. After washing with 
35 hypertonic buffer, the spheroplasts would be analyzed 
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for extent of AKTrp* binding. 

If BPTI were found free in the periplasa, then we 
would expect that the chimeric protein was being 
cleaved both between BPTI and the M13 mature coat 
sequence and between BPTI and the signal sequence. In 
that case, we should alter the BPTI/M13 CP junction by 
inserting vgDNA at codons for residues 79-82 of 
AA_seq2. 



If CrTI were found attached to the inner sernbrflne,- 
then two hypotheses can be formed. The first is that 
the chimeric protein is being cut after the signal 
sequence, hut is not being incorporated into LC7 

15 virion; the treatment would also be to insert vgDNA 
between residues 78 and 82 of AA_seq2. The alternative 
hypothesis is that BPTI could fold and react with 
trypsin even if signal sequence is not cleaved. U- 
terminal anino acid sequencing of t.rypsin-binding 

20 material isolated from cell homogenate detemines what 
processing is occurring- If signal sequence vece being 
cleaved, we would use the procedure above to vary 
residues between C78 and A82; subsequent passes would 
aJc r'*-id.;es a fter . residue 81. Tf signal sequence were 

75 ,not beinc, cleaved, we would vary residues between 23 
and 27 of AA_seq2. Subsequent passes through that 
process would add residues after 23. 
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If BPTI were found neither in the periplasm nor on 
30 the inner membrane, then we would expect that the fault 
was in the signal sequence or the s igna 1 -sequence-to- 
BPTI junction. The trcjtnent in this case would be to 
vary residues between 23 and 27. 



35 



Analytical experiments to determine what has gone 
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uronq take time and effort and, for the foreseen 
outcomeG, indicate variations in only two regions. 
Therefore, ve believe it prudent to try the synthetic 
expericicntG described below without doing the analysis, 
i For example, these six expcrinents that introduce 
variegation into the bot i -none VTI I fusion could be 
trie 

1) 3 variegated codons between residues 78 and S2 
10 using olig?12 and olig^l3, 

2) 3 variegated codons between residues 23 and 27 
using oligi^l^ and oliq.?15, 

15 3) S variegated codons between residues 78 and 82 

using olig;i3 and olig.TL2a, 

4) 5 variegated codcns between residues 23 and 27 
ui^ing oligslS and olig^l^a, 

20 

5) 7 variegated codons between residues 78 and 8 2 
ur;inq olig?13 and olig5l2b, and 

6) 7 variegated codons between residues 23 and 27 
25 using olig^iS and oiigi*l4b. 

To alter the BPTI-M13 CP junction, we introduce 
D.MA variegated at codons for residues between 78 and 82 
into the r:nh I and Sf i I sites of pLG7. The residues 

30 after the last cysteine arc highly variable in amino 
acid sequences honolcgous to DPTI, both in composition 
and length; in Table 25 these residues arc denoted as 
C79, C30, and A81. The firct part of the M13 CP is 
denoted as A32, E83, and G84. One of the oligo-nts 

35 olig=12, olig»12a, or oligil2b and the priner oliq?13 
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are synthesized, by standard r.ecnods. The oligo-nts 



are: 



25 81c Bid a2. 33 8-: -35 SG 87 

q f >: i qf /C I CCT i G^A | OCT ' CAT ! CAT | CCC I - 



^^ 29 Z'O 91 



residue Oi 90 89 8'3 87 S'^ 

qg;cgc|CCClCCClTTT|C3C|CGC|;..TC 3' olirj = l 



0.22 G) , and k is a nixture of equal parts cf T and C. 
^.0 The bases ::hovrn in lover C3se at either end :irc spacorn 



I 



M 

residue 75 76 77 73 79 80. 81 82 S3 P ^ 

5' gc|gagicCCjATC|CCT| ACC|TGC|qfy:|qfk|qfJc|CCTl GAAl - - 

84 35 86 87 83 30 90 91 p:^--! 
.CCT;CATlCATiCCG|CCC|A/'u\|CCCiCCCiqcgicc 3' olIgn2 f£.V---> 

K:'-' 7- 

residue 75 7 6 77 78 70 SO 81 Ola 31b '^ij^'-^^ 
5' gcigag|cCCiATG|CGT|ACC|TCC|q:>;[qfk|qCk|qfklqfk|- 

15 8283248586 -3 7 

OCT ! CAA I CCT I GAT | GAT I CCC [ - 



88 89 90 91 
CCC| A-'\A|CCCiCCC|gcg| cc 3' olig = 12a 

residue 75 76 77 78 T? 80 81 21^ 81b 
5' gcjgag(cCC!ATC|CCTiACClTCClqf>:lqf>ciqfk|qfk|qr*/ci- ^ 




CCCl AAA ; CCC I CCC [ gcg I cc 3' olig = l2b 



vhero q is a -ixture of (0.26 T, O.ISC, 0.20 A, ;ind ^• 
0.30 G) , f is a ci;<ture of (0.22 T, 0.16 C, 0.40 A, .ind 



and are not incorporated into the cloned qcn«. The ^- • 

;.ri-er is co.-plenentary to th% 3' cr.d of each of the ^•■.'r*:'- 

Icnger oli'70-nts. One of the variegated oligo-n-s and t-/ 

the pri.T.cr clig=I3 are cor.binod in equine lar a-ounts [l/'c 

and annealed. The dsO::A is completed with all four ^^v-^ 

(nt)7Ps and Klenow frag-ont. The resulting dsDSA and tMr^'. 

RF pLC7 arc cut with both Hi I and S£>h 1. purified, eJ:: 
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nixed, and iigaced. This ligation nixturc qocs through 
the process described in Sec. 15 in vhich -/o select a 
trAnsformed clone that, when induced vi'^h irrc, binds 
AHTrp. 

To vary the junction between M13 sigrjal seq^jence 
^nd BPrr, wo introduce C::a variegated at colons tor 
residues between 23 and 27 into tne Kl^n I and Xho I 
sites of pLC7. The first three residues are ;;iihly 
variable in anino acid sequences hcnoloqous to BPTt, 
iogous sequences also vary in length at the amino 
terminus. Ore of the oligo-nts olig=l4, cliq?i4a, or 
olig?14b and the. prirr.cr olig;lS ore syr.chesized by 
standard methods. The oligo-nts are: 



residue : 17 i3 19 2 0 2 1 2 2 2 3 2 4 2 3 

5' g|gcc|gcClGT.\ICCC|ATClCTClTrrlTTT;GCT|qfX.iqf;:; - 

2e 27 23 29 30 
!qfk|TTC;TCr!CTC;CAClcgc|ccg!cga' 3' o11ct14 



residue 17 13 10 20 2 1 22 23 24 ,"5 25 
' g I gcc 1 qcC I GTA ; CC3 j ATC 1 CTC I TCT i TTT ; OCT [ q r< I q 1 >: ; c f :< j - 

26a 26b 27 28 29 30 
iqfklqfklTTClTG-ICTClCACicgv-jccgic^ai 3' Qli- = l^a, 



residue 17 la 19 20 21 22 23 ?.4 26 
5 'g : gcc | gcC ; CTAl CCG [ ATG [ CTC [ TCT | TTT GCT; qf k I qi V. : qf k | - 

26a 26b 26c 2ed 27 23 29 30 
iqfV:|qfk|qfklqf:-:jTTClTGT(CTCiCAG!cq=-_-cglcgai 3'olig = 14b 



tcg|cgo|gcg|CTC|CAGiACA|CP^! 3' cligrlS 



where q is a nixturc of (0.26 T, 0.13 C. 0.26 A, and 
0.30 C) , f is a Dixturc of (0.22 T, 0.16 C, 0.40 A, and 
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30 
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0,22 C) , and k is a mixtiire of equal parts of T and G. 
The bases shown in lowor case at either end arrs spacers 
and are not incorporated into the cloned qcne. Ore of 
the variegated oligo-nts and the priner are combined in 
equiaolar anounts and annealed. The ds DUA is 
completed with all four (nt)TPs and Klcnow fragnient. 
The resulting dsD?/A and RF pLC7 are cut with both K^n I 
and Xho I, purified, nixed, and ligated. This ligation 
nixture goes through the process described in Sec. 15 
in which we select a transforrned clone that, when 
induced with IPTC, binds AKTrp or trp. 

Other nucbcrs of variegated codons could be used. 

I f none of these approaches produces a work i ng 
chiiceric protein, we nay cry a different sicnal 
sequence. If that doesn't vor.k, we may try a different 
OSP in M13 because the structural data clearly indicate 
that BPTI could r.ct be joined .to the carboj<y teminus. 
The next best OS? of y.i: is the gene III protein 
because there is fus-ion data (SMITS5, CRU288) . 



Ex^rnle 1. Pott: TT 



BPTI binds 
^K^ » 6.0 X 10---^ M) 
these colccuics are not prut'crred 
anount of BPTI to display on LC7 or the 
affinity molecule to attach to the column, 
et al . reported on the binding of several 
derivatives to various proteases: 



very tightly to trypsin 
and to anhydrotrypsin, so tht-t 
for cptinizing the 
amount of 
Tschesche 
DPTI 
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Dissociation cor.stan-s for riPTI derivatives, .Mclar. 



Residue 
»15 



lysine 
glycine 
dlani ne 
va I ine 
leucine 



Tryps in 
(bovine 
pancreas) 



Cliynotryps in 
(bov i-c 
pancreas) 



Elastasc 
(porcine 
pancreas) 



Elastose 
(hucan 
leukocytes) 



6.0 X 10 



9.0 >: 10 



-9 



3.5 X 10 



-6 



7.0 >; 10 



-9 



2.8 
5.7 
1.9 



X 10" 



10" 



10 



-8 



2.5 X 
1-. 1 xl 
2.9 X 10 



10-y 

rj-lO 

9 



From the report of Tschcschc et a 1 . we infer that 
molecular pairs n^irkcd have K^s greater than 

3.5 X 10"^ M and thJt r.olecular pairs narked have 
K^js much greater than 3.5 x 10*^ K. Because of the 
wealth of data about the binding of npTI and various 
mutants to trypsin and othsr proteases (TSCH37) , we can 
proceed in various ways. (for other PBDs ve can cbt.-^in 
two different rtonoclonal antibodies, one with a high 



M, and cnu with a 



affinity having K-* p: order 10'^* 
moderate affinity having Kj on the oraer of 10"? V.} 
In this example, -g r.ay use: the .T,oder3te fcir.iinq 

between BPTI and hur.an Icukccyre elastase (liuLEl), b) 
the moderately strong binling of porcine elastas«2 to 
BPTI(V15), or c) the binding of BPTK^IS) (residi? 26 
in the pbd gene) fcr trypsin (weak but cetectablel cr 
for porcine pancreatic clastase. 

Following the teachings of Sec. 10, ve co-para the 
retention of LG7 virions to the retention of wild-type 
M13 on (AHTrp). ::i3 derivatives having more Ct-'A than 
wild-type M13 have ccrrocpcnd ing longer virions. Thus 
we will create p:^3 that differs from pLC? only in 
having ctop codons at codons 2 and 3, and an altered L 
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codon at ccdon 7 of c.^o eg;?- ■ rbd gene. Phage LGS uil'l 
have exactly as nuch Dt!A as ICl ; therefore the LC8 
virion is exactly as long as the LG7 virion. LC8 can 
not, hove"er, display BPTI on its surface. To generate 
5 these mutations we synthesize the oligo-nt 

5' (121) |aac|gct iagc|ctt|Cag| aacicag(aga I ttA|ctA|cat| - 
10 |agt:igag|cct| (SO) 3' oUg = ll 



that is conplenentary to bases 80 through 121 of the 
jpbd gene, shown in Toble 23, except for the three 
upper-case, underscored bases. Olig^ll and the 
priners blig?24, oligi25, and olig=?6 are annealed to 
circular ssD.MA fron LG7. Klenow fragr.cnt (from US 
Biocheriica 1 ) and all four (nt)TPs are used to complete 
the circular dsDNA. After treatr.ent with Klenow 
fragzient, the dsDNA is treated with ligase. Cells are 
transformed with the ligatod dsCNA and, after a short 
gro-out. the cells are plated on anpici 1 lin-containing 
L& agar. By changing t):e third base in codon 7, ve 
have destroyed the unique A f \ II site in plC7. Thus we 
can screen colonies for loss cf the A fl 11 site. To 
ccnfirn the construction, C!<*A fron plaques with nc 
Af 1 II site are sequenced fro- about base :-;o to about 
base 40 of the c sp- \ cbd gene. 



To cxpotfite identification of different Mi3- 
derivod phage, we replace the arp ^ gene of LG3 with the 
tet*^ gene frbn pBRj22. Piasr.id pBR322 is cut with 
Bsn I at the unique site at 1353 and the linearized r::A 
is blunted with Klenow fragr.ont and purified. The 
blunt DNA is cut with Aat 11 and the 1^23 -base tet^- 
bearing fragnent purified by agarose gel 
electrophoresis or »PLC. Piasr.id pLG8 ds DNA is cut 
with Xba r at the unique site and the linearized CN*A is 
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blunted vith Klcnov trnqmont Ar.d purified. The ii-Cwir, 
biunt OtiA is digested with A.\ t II and the 7.3 kb 

fragmonc is isolated. The t*-o isolated DNA f raqr.cn ts £ ■.'■t^^ 

are mixed, annealed, lig^Tted, and used to transfers t'-j-'-^J 

k:^ . . . - . , . h-:^.? 



5 competorit E^, col i cells'. The trans forried cells are t'^-.i.'^ 
selected with tetracycline. The correct construction r.'?Vv 

2, ■*" 'H 



^-J-^ ■ contains Sa I I, KcoP I, and ZcoR V sites, but LOS 

contains none ot these. The correct const ruct icn, 
g]^ having 9; 2 Kb, is ojcily distinguished from p-2a::2 and 

10 is called LCIO. C:iA from phage LCiO is se^-ucnced in ^'T^^ 
the vicinity of the junctions of the nevly inserted 
tet " gene to confir.Ti the construction. 

The phage LC7 is grown at various levels o: I PTC 
15 in the mediun and harvested in the way previously p'. l.:*^ 

ST-f described. An affinitv colur.n having bod vclc.-e 2.0 '< '''^ 

nl and supporting an ar.ount of HuLEl picked fron t.he fc'-^*^ 
range C.l ng to 3 0.0 ng on 1 ntl of Bic.=»jd A::i- 
;-.J Gel 10 ("^"J or Af:i-Gcl isi'^-'^J is designated {-i-^lt:]. 

20 An appropriate set of densities oc HuLill on the 
{'A is (0.1 .-g/ml , 0.5 .-g/.i 1 , 2.0 .r.g/ni I , 3.0 r^c/nl , 13 . D ^- 

'j:*^ mg/nil, and 30.0 r;g/nl). The of (KuLSij is, *ry 

f-^', hypothesis, L.O nl . The elution of * LCT pr.ace is 

••"^ compared' to the eluticn of LCIO on {JiuLIll r.av:.".-; 

2 5 varying arr.ounts -of MuLEl affixed. The colurms ar^i 

e luted in a standard vay: 

^^:/ 1) 10 nM KCl buffered to pH 3.0 uith phosphite, 

^-'ji until optical dcn?Ltv at 3S0n- falls to base li.-.e 

■-a 

3 0 or 4 X Vv, vhic.'iover is first. 



t 



tftvi 2) a gradient of 10 nM to 2 M KCl 

^•v held at S.O with phcjiphate, 

^ . \ 

Ml 

35 3) a gradient of 2 ,M to 5 M KCl in .1 x V7. i-:"^ 
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phosp^iate buffer to'pH 2.0, 

4) constant 5 H KCi plus 0 to 0-8 M qu^nidiniun CI 
•in 2 X Vy, with phosphate buffer to pH 8.0. 

The preferred level of induction ( IPTCoptimal ) 



amount 



; f i n i ty no 1 ecu 1 e 



the 



20 



(DoAMoHcptical) those settings that . give the 

sharpest UZ7 cluticn peak chat shows significant 
retardation as co.-pored to LC3,, which carries no BPTI. 
By hypothesis, the best separation occurs for the 
amount of BPTI/CP produced vhen the cells are induced 
with 10.0 uM IPTC and vhen 4.0 ng HuLtl/ni is applied 
to BicRad Affi-Cei lof^'""). 

When the ar.cunt of EPTI/C? and the a.-nount of 
HuLEl/volur.e of support have been optimized, we turn to 
optimization of elution rate, initial ionic strength, 
and the a.-ount of CP/(voluce of support). These 
parameters can be optimized, separately. 
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Usi.-g cptinal BPTI/GP and Hu LEi/voLu.-ne of support, 
we measure the elution volume of LC7 and LGO for 
differe.-it elution rates, vi^. i, 1/2, 1/4, 1/3 and 1/16 

25 times the caxinun flow rate. Ml 3 is shear resistant, 
so that the pressure that can be applied across the 
column is United only by the r.ecnonical properties of 
the support material. By hypothesis, 1/4 of maximum 
elution rate is better than 1/2, but 1/3 is about the 

30 same as 1/4. Therefore 1/4 ."zaxinun elution rate will 
be used. 

Elution volures of LG7. cbtainsd frcn cells grown 
on media that is 2.0 rJ-I in IPTG are measured at optimal 
35 DoAJ-toM and eluticn rate for loadings of 10^, 10^°, 
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2 10 

10^^, and :o^2 pf^. By hypothesis, 10^2 p^^^ 
LC7 overloads the colur.n and significant nunber of 
phage elute before their characteristic position in the 
KCl gradient. We also find chat 10^^ pfu overloads the 
5 colunn only slightly, and that 1010 pfu does not 
overload the column. Because the use of the affinity 
separation in Sec. 15 will involve a population in 
which no single mcnber is .■::ore than one part in 10^, we 
conclude that 10^2 of ^ variegated population could 

10 be applied to a colunn of 1.0 ml matrix volume without 
overloading with respect any one species. The 
overloading of a 1.0 ml colunn by 10^^ , pfu also 
indicates that the initial colunn that captures 
indiscriminately adhesive phage should be 5 to 10 times 

15 _ as large as the column that supports the target 
materia 1 . 

Elution volumes of LG7 and LGIO obtained from 
cells grcvn on media that is 2.0 mM in I PTC are 

20 measured it optimal DoAj'^oM and elution rate and toz a 
loading of 10^° pfu for vai*ious initial ionic 
strengths: 1.0 mM, 5.0 r-'-I, 10.0 iraM, 2 0.0 mM, and 50.0 
r-Jl. We find that LCIO is slightly, retarded by t.^e 
column when loaded at 1.0 rCi KCl. but that LC7 always 

25 comes off the column at its characceristic place in the 
gradient. We use 10.0 mM as initial ionic strength in 
all remaining affinity separations. 
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To determine the sensitivity of chromatography of 
phage that display variants of 8PTI on their surfaces 
(Sec. 10.1), we prepare artificial mixtures of two 
closely-related phage that differ only at one residue 
in the BPTI domain. One variety of phage has strcn-; 
affinity for the column used in this step, while the 
other phage has no affinity for the column. We 




chroma tograph these nixCurcL; to discover hov little of 
the phage that binds to the coLuran can be detected 
within a large majority of phage that do not bind the 
column. 

For these tests we chocse AHTrp as.AeH(BPTI). A 
column having 2 nl bed volume is prepared with 
(OoAMoHoptinal °^ AIITL-n)/(nl of Affi-Cel 10 1 '^•''•)). 

The column is cal led . ; AMTrp ) and h^s = 1.0 nl. 



A new phage, LC9, is prepared that displays 
BPTI(V15) as IPBO in contrast to LC7 that displays 
BPTr(K15, wild-type) as IPBU. Residue IS of BPTI is 
residue 38. of the os o- i pbd gone. We introduce the 
change K38 to V by replacement of a short segment of 
the oso-iobd gene. The two oligo-nts 
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gtt'cct CCt|ATa| 

CAA 1 cqA I gcA | taT I 

(BssH II) 



r I i 1 i 

CCt I ATa I ATa 
tar 



r 

<3 
CGc 
gcC 



y I t 

TAT I TTC 
ata j aag 
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Jc 


a 
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51 


TAC 


AAC 


CCT 


AAA 


CCA 


cc 


atg 


ttg 


cga 


ttt 


cgt 


cc 



i stu X 



3' oUg = 16 
5' oligsl? 



are synthesized by standard mcthcds and annealed; the 
lower case letters in olig=i6 and the upper case 
letters in oligjl? arc mutant with respect to pLC7. 
Plasmid pLC7 DNA is digested with both Apa I and Stu I 
and the large piece purified. The ds oligo-nt is added 
to the purified backbone of pLG7 and ligated: the 
ligated DNA is used to transform competent cells. 
After a short grow out, . the cells arr^ plated on 
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anpicil I in-containi ng plates and Arip^ colonies are 
picked. The autations destroy the unique BssH ir site* 
thus ve can screen colonies through restriction 
digestion. To confirin the construction, DNA from 
5 colonies having the correct restriction digestion 
pattern is sequenced fron about 10 bases above the 
Stu r site to about 10 bases below the Aoa I site. The 
correct construction is called pLC9. 

10 To- expedite differentiation between LC7 and an 

LG9-derivative phage, ve replace the anp ^ gene of LC9 
with the tqt *^- gene fron p3R322. Plastnid pBR322 is cut 
with Bs.-n I at the unique site at 1353 and the 
linearized C:;a is blunted with Klenoy fragr.ent and 

15 purified. The blunt D;:A is cut wich r^^z 11 and the 
142C-base tet^-bearinq fraqnicnt purified by agarose gel 
electrophoresis or HPLC. Plasmid pLC9 ds DMA is cut 
with Xb>^ I at the unique site' and the I inoari zed DMA is 
blunted with Klenow fragr.ent and purified. The linear, 

20 blunc D.'.'A is digested with Aat II and the 7.8 V-.b 
frag:nent is isolated. The two isolated Dt<A fragnents 
are nixed, annealed, liqated, and used to transfom 
cc:2petent col i cells. The transformed cells are 

selected with tetracycline. The corr^^ct construction 

25 contains F.a 1 I, F.coR I, and KcoR V sites, but LC9 
contains none of these. The correct construction, 
having 9.2 kb, is easily distinguished from p8R322 and 
is called LCll. DI.'A fron phage IXll is sequenced in 
the vicinity thfe junctions of the newly inserted tct*^ 

30 gene to confim the construction. 
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ICl and LGll arc grown with optir.un IPTC (2.0 nM) 
and harvested. Mixtures ore prepared in the ratios 

LC7:LCll :: 




where V||p ranges fron 10^^ to 10^ by factors of 10. 
Large values of V^^^ are tested first; once a '^n^ 
found that allows recovery of IC7 , smaller values of 
^lin ^® tested. Once a value of is found 

that allows recovery of LG7 , we test values that arc 
largei- by 2-, or 3- fold so that is decr rained 

within a factor of 2. 

The column (AHTrp; is first blocked by treatment 
with 10^^ virions of Ml3a::429 in 100 ul of 10 oH KCl 
buffered CO pH 3,0 with phosphate; the column is washed 
with the sa.T.e buffer until OD260 c-eturns to base line 
or 4 X have passed through the column, whichever 

first. One of the mixtures of LC7 and LCll 
lining 10^^ pfu in 1 nl of the same buffer is 
applied to ( AHTrp | . T-he column is eluted in a standard 
way : 



1) 10 zSi KCl buffered to pH 8.0 with phosphate, 
until optical density at 230n:n falls to base line 
or A x \\r, whichever is first, (discard effluent), 



2) a gradient of 10 rJl to 2 M KCi in 3 x 



held at 3.0 
fractions) , 



with phosphate, (30 



100 



p:i 

ul 



3) a gradient of 2 M to 5 M 
phosphate buffer to pH 8.0, 
f riict ions) , 



KCl in 3 X Vy, 
(30 X 100 ul 



4) constant 5 M KCl plus 0 to 0.3 M guanidiniun CI 
in 2 z Vy, with phosphate buffer to pH 3.0, (20 x 
100 ul f ract ions) , 



5) constant 5 M KCl plus 0.3 M *guanid iniu.-n Cl ii. 
1.2 X Vy, with phosphate buffer to pH 8.0, (12 x 
100 ul fractions) . 



Sar.ples of 4" ul fron each fraction are plated at 
suitable dilution on pr.ogc-scns i t ive !>up^ cells (so 
that M13'an42S will not qrow) . In addition to the 
effluent fractions, a sar.ple is removed from the column 
and used as an inocuiur. for phage-sensit ive Sup*" cells. 
Plaques are transferred to ampicillin-containing LE 
agar. Colonies that are amp ici 11 in-resistant are 
tested for display of n?TI(K15) by use of trp* .or 
AKTrp*. Tes".:ing begins vith colonies obtained by 
culturing an inoculum fron the column, proceeds to the 
last effluent fraction, and wort:s backwards toward 
earlier fractions. Once a positive colony is found, no 
further tests are required for that value o? ^Hr,- If 
nc BPTI positive colonies are detected, the population 
of phage obtained fron the column matrix and the last 
few ( e.g. 5 to 10) ph-^gc-bea ring fractions ace merged 
and cultured. Phage are harvested from this culture 
and chrbmatograched by che above procedure. This 
process ccntinues until a positive colony is isolated 
or Nqi^j-qjo passes of chro.-natography and growth have been 
completed. If no positive colonies are detected after 
Nchrom passes of enr ichr.ent, VI im is reduced by a 
suital-xe factor and the process is repeated. 



By hypothesis, 



aim 



= 4.0 



10 



8 



is the largest 



value for which LC7 can be recovered. Thus C^jgnsi " 
4.0 X 10^. Three cycles of chromatography are required 
to isolate LC7, so tlio first approximation to Cgff is 
740 ( = exp( lcge(4.0 x lO^)/3 ) ). 
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We now deternine the efficiency of the affinity 
separation (Sec. 10.2). This is done by: a) preparing 
nixtures of LC7 and LCll in' the rdtio 1:Q, b) enriching 
the population for LC7 for one separation cycle^ and c) 

5 dcternininq the fraction of LG7 in the last phage- 
boaring fraction. The phage are obtained from cultures 
induced at 10.0 uM IPTG, the optical level. Q is 
decreased uncil roughly half the phage are LP7 , We 
start with Q = 1.5 x 10^ 20 x approximate C^ff. The 

0 mixture is applied to a {AHTrp) colunn bearing 4.0 mg 
AHTrp on 1.0 nl of Affi-Cel 10 (the optimal DoAMoM) and 
cluted in the specified nanner. A sample of 4 ul from 
each fraction is plated at suitable dilution on phage 
sensitive cells on LB agar. The identity of colonies 

5 in the last phage-boar ing fraction is determined by 
transferring colonies to ampicillin-containing and 
tetracycl ine-contain Ing plates; colonies that snow Tet*^ 
are from L.311 -and colonies that shew Ar.p-^ are fron LG7 , 
When Q is 1.5 x 10*^. j^. of colonics are BPTI positive. 

0 When Q is 1.5 x 10^, 50^ of the colonies are BPTI 
positive. Thus we calculate ^^eff ~ -^^ ^ ^-^ ~ 
000. 



Myoglobin . is strongly colored and it is possible 
5 ,that binding of IlIIMb to M13 could provide enough 
optical absorption to allow FACS sorting cf M13 that 
bind HHMb (See Sec. 10.4). 

We have now constructed 1X7 that displays one or 
0 more, BPTI do.T.ains on c.ich virion. The oso - inbd gene is 
under control of the 1 ^r-.^VS promoter so that expression 
levels of 0FTI-!U3 CP can be. manipulated via [Ip-TG). 
This construct r.ay be used to develop nany different 
binding proteins, all based on BPTI. An optimura level 
5 of induction has boon determined. An optimum .'imount of 
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AfM(P3D; « DoA.^oMc,j,cin)U.-n ^ ^.0 mg/(ral of support; has 
been determined; target ncLecules will be applied to 
columns at this level in the process disclosed in Sec. 
15,1. These optirium levels may be adequate for all 
targets and oil varicgauions of BWl displ-iyed on 
derivatives of M.13 based on LCI, tut sone further 
optimization nay be needed if other values of pH or 
tenperatures are used. 

Other Dbd gene fragments may be substicucod for 
the bpt i gene fragniiint in pLG7 with a high liVielihcod 
that PBD will acpejr on the surface of the nev LC7 
derivative . 



15 



HHMb is chosen as a typical protein tar7ct; any 
other protein could be used. HKMb satisfies all of the 
criteria for -i target: 1) it is large enough to be 
20 applied to an affinity matrix, 2) after attoch,r:cnt it 
is not reactive, and 3} after attachnent there is 
:;ufficient unaltered surface to allov specific binding 
by PBOs. 

25 . The essential information for fli-Mb is knovn: 1) 
HHMb is stable at least up to 70*^0, Letveen pM 4.4 and 
9.3, 2) HHMb is stable up to 1.6 M Ouan iiin iuir. Cl, 3) 
the pi of HHMb is 7.0, 4) for HH!5b, M^- = U*,000, 5) 
HHMb requires hacn, 6) HHMb has no proteolytic 

30 activity. 



35 



In addition, the fol loving information about HHMb 
and other myoglobins is available: 1) t.he sequence of 
HHMb is knovn, 2) the 'JO structure of spcr.-n vhale 
myoglobin is Known; HHMb has 19 amino acid differences 
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and ic is generally assumed that the 3D structures are 
almost identical, 3) HHMb has no enzymatic activity, 4) 
HHMb is not toxic. 

We set the specifications of an SBD as : 

1) T = 25OC 

2) pH = 8.0 

3) Acceptable solutes : 

A ) for binding : 

i) phosphate, as buffer, 

ii) KCl, 10 nM, 
B ) for column clution : 

i) phosphate, as buffer, 0 to 30 v^, 

ii) KCl, up to 5 «, and 

iii) Cuanidinium CI, up to 0.8 M. 



0 to 2 0 mM, and 



-8 



H. 
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4) Acceptable < 1-0 x 10 
We choose LG7 as CP(IP3D). 

As stated in Sec. 13. i. the residues to be varied 
are picked, in part, through the use of interactive^ 
computer graphics to visualize the structures. In this 
-section, all residue numbers refer to BPTI . We pick a 
set of residues that forns a surface such that all 
residues can contact one target molecule. Information 
that we refer to during the process of choosing 
residues to vary includes: 1) the 3D structure of BPTI,* 
2) solvent accessibility of each residue as computed by 
the method of Lee and Richards (LEEB71), 3) a 
compilation of sequences of other proteins homologous 
to BPTI, and 4) knowledge of the structural nature of 
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different: or.ino acid cypcs. 
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Tables 16 and 34 indicate which residues of B?TI : 
a) have subscancial surface exposure, and b) are known 
5 to tolerate other anino acids in other closely related 
proteins. We use interactive computer graphics to pick 
sets of oiqht to twenty residues that are exposed and 
variable and such that all r.e-ters of one set c^n touch 
a molecule of the target natcrial at one tize. If BPTI 
10 has a s.-:all anino acid at a given residue, that anino 
acid nay not be able to contact the target 
simultanccusly vich all the other residues in the 
inceracLicn set, buc a larger ^aino acid aight well 
a-.ake contact. A charged ar:vno acid night affect 
15 binding vithout naJcing direct ccntact. In such cases, 
the residue should be included in the inceractioa set, 
wich a notation that larger residues night be useful. 
In a similar way, large ar.ino acids near the geometric 
center of t.^e interaction set r.ay prevent residues on 
20 either side of the large central residue frcni naking 
simultaneous contact. If a sr.all anino acid, however, 
were substituted for the lAcqe anino acid, then the 
surface would becono flatter and residues on either 
side could -ake sinultaneous ccr.tact. ' Such a residue 
25 should be included in the interaction set with a 
notation that small anino acids zay be useful . 

Table 35 vas prepared frcn standard rcdcl parts 
and shows the naxiriiir. span bctvcen C^jj^^j and the tip of 
30 each type of side group. C^cta "scd because it is 
rigidly attached to the protein nain-chain; rotation 
about the i p^j-Ct^Q^a bond is the most important 
degree of froodon for deternining the location of the • 
side group. 
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Table 34 
given criteria. 



210 

indicates five surfaces that -set the 
The first surface comprises tne sot of 
residues that actually contacts trypsin in the conpleic 
of trypsin with BPTI as reported in the SrooV'-haven 
Fror.ein Data Bank entry "ITPA". This set is indicated 
by the nur.ber "1". The ex-posed surfar.e o': the residues 
in this set (taken fron Table 16) totc^ls ll-:8 A^. 
Although this is not strictly the area of contact 
between BPri and trypsin, it is approx ina te ly tr.e same. 

Other surfaces, nunibercd 2 to 5, were picked by 
first picking one exposed, variable residue and then 
picking neighooring residues until a surface was 
defined. The choice of sets of residues shovn in Table 
34 is in no way exhaustive or unique; other sets of 
variable, surface residues can be picked. Sot ^2 is 
shown in stereo view. Figure 10, including the alpha 
carbons of nm, the disulfide linkages, and the side 
groups of the set. We take the orientation of BPTI in 
Figure 10 as a standard orientation, and hereinafter 
refer to K15 as being at the top of the riolecule, while 
the carboxy and amino ter-ini are at the bottcn. 

Solvent accessibilities are useful, easily 
tabulated indicators of a residue's exposure. Solvent 
accessibilities nust be used with 3c-:e caution; snail 
a'nino acids are under-represented and large a.-ino acids 
over-represented. The user nust consider vhat the 
solvent accessibility of a different amino acid would 
be when substituted into the structure of EFTt. 

To create specific binding betveon a deriv.itive of 
EPTI and IflLXb, ve will vary the residues in set ?2. 
This set includes the tvclve principal residues 17(R), 
19(1), 21(Y). 27(A), 2S(C), 29(L), 31(Q), 32(T), 34(V), 
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Residue 21 is not very variable, containing F or Y 
in 31 of 33 ca::cs and I and W in tne remaining cases. 
30 The side group of Y2 1 fills the space between T32 and 
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\:': } vary thea with a high probability of retaining the f?...- ^ 

5 underlying structure. Independent substitution at each. t'>.:ri 
of these twelve residues of the amino acid types ^ ^lu*^ 

observed at that residue would produce approxinately 
4.4 X 10^ amino acid sequences and the same nuaber of 
surfaces. 
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48(A), 49(E), and 52(M) (Sec. 13.1.1). None of the \. 
residues in set s2 is. completely conserved in the f • 

sample of sequences reported in Table 34; thus we can \ 
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DPTI IS a very basic protein. This property has 
been used in isolating and purifying DPTI and its . f| 

M 



m 



iiomo lOgutiii su tiiTjC C;te irigt'i LL*umJt:riCy o*i 'drqinme ana 
lysine residues nay reflect bias in isolation and is 

15 not necessarily required by the structure. Indeed, h. 

SCI-III fro.-n Dop.bvx rori contains seven more acidic p.,'-' 
than basic groups {SASA84). 

Residue 17 is highly variable and fully exposed f'"' '/^ 

20 and can contain R, K, A, Y, H, F, L, M, T, C, Y, P, or hr'-j^ 

S. All tvpes of ammo acids are seen: large, small, g': vv 

r S: 

charged, neutral, and hydrophooic. That no acidic W-.-.^jt 

croups are observed may be duo to bias in the sa.T.sle. e*"*'^ 

25 Residue 19 Ls also variable and fully exposed, 

containing P, R, I, S, K, Q, and L. r " -? 



I 



the main chain' of residues 47 and 48. The OH at the j-.^^^-t 

tip of the Y side group projects into the solvent. Lr.V'v- 

Clearly one can v.iry the surface by substituting Y or F !r ^^^'5 

^ so that the surface is either hydrophobic or |- 

hydrophilic in that region. It is also possible that 
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Hi or the other 
tho ochcr aroaatic aaino acid (U^ H) 

hydrophobics {L, M, or V) night, be colerated. 

Residue 27 mosc often contains A, but S, K, L, and 
T are also observed. On structural grounds, this 
residue will probably tolerate any hydrophilic ammo 
acid and perhaps any aaino acid. 

Residue 28 is G in DPTI . This residue is in a 
curn, but is not in a conforr.ation peculiar to glycine. 
Six other types of anino acids have been observed at 
this residue: K, Q. «, and S.all side grou s 

at this - residue , n-ight not contact HHMb simultaneous y 
with residues 17 and 3.. I^rge side ^ 
interact with HHMb at the saoe tine as residues 7 ad 
,4. Charged side groups at this residue could af.ect 
binding of HHi^b. on the surface defined by the other 
residues cf the principal set. Any a^ino acid, except 
perhaps P, should be tolerated. 



25 



30 



Residue 29 U highly variable, most often 
containing L. This fully exposed position will 
probably tolerate alr.okt any aaino acid except, 

perhaps, P. 

Residues 31. 32, and 3. are highly variable 
'exposed, and in extehdcd ccnforr.ations: any amino acid 

Should be tolerated. 

Residues *8 and 49 are also highly variable and 
fully exposed, any anino acid should be tolerated. 

Residue 52 is in an alpha helix. Any amino acid, 
except perhaps P, night be tolerated. 
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Now we consider possible variation of the 
secondary set (Sec. 10.1.2) of residues that are in the 
neighborhood of the principal set. Neighboring 
residues that* might be varied at later stages include 
9(P), 11(T), 15(K), 16(A), 18(1), 20{R), 22(F), 24(H), 
26(K), 35(1C), 47(S), 50(D), and 53(R). 

Residue 9 is highly variable, extended, and 
exposed. Residue 9 and residues 43 and 49 are 
separated by a bulge caused by the ascending chain from 
residue 31 to 3-i. For residue 9 and residues 48 and 49 
to contribute simultaneously to binding, either the 
target must have a groove into which the chain from 31 
to 34 can fit, or all three residues (9, 48, and 49) 
oust have large anino acids that effectively reduce the 
radius, of curvature of the BPTI derivative. 



. Residue 11 is highly variable, extended, and 
exposed. Residue 11, like residue 9, is slightly Car 
20 froc the surface defined by the principal residues and 
will contribute to binding in the same circumstances. 

Residue 15 is highly varied. The side group of 
residue- 15 points away form the cace defined by set =2. 
25 Changes of charge* at residue 15 could affect binding on 
the surface defined by residue set 52. 

Residue 16 is varied but points away from the 
surface defined by the principal set. Changes in 
30 charge at this residue could affect binding on the face 
defined by set »2. 

Residue 18 is I in BPTI. This residue is in an 
extended confornation and is exposed. Five other amino 
35 acids have been observed at. this residue: M, F, L, V, 
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and T. Only T is hydrophilic. The side group points 
directly away fron the surface defined by residue set 
(2. Substitution of charged amino acids at this 
residue could affect binding at surface defined by 
residue set *2. 

Residue 20 is R in BPTI. This residue is in an 
extended .confomation and is exposed. Four other asino 
acids have been observed at this residue: A, S, L. and 
Q. The side group points directly away froa the 
surface defined by residue set ?2. Alteration of the 
charge at this residue could affect binding at surface 
defined by residue set i2. 



15 Residue 22 is only slightly varied, being '{ , F, or 

H in 30 of 33 cases. ffeve rthe 1 ess , A, N, and S have 
been : observed at this residue. Araino acids such as L, 
H, I, or Q could bs tried here. Alterations at residue 
22 may affect the mobility of residue 21; changes in 

20 Charge at residue ' 22 could affect binding at the 
surface defined by .residue set e2. 



i. 



Residue 24 shows so-e variation, but probably can 
not interact with one r.olecule of the target 
25 simultaneously with all the residues in the principal 
set. Variation in charge at this residue might have an 
effect on binding at the surface defined by the 
principal set. 

30 Residue 26 is highly varied and exposed. Changes 

in charge nay affect binding at the surface defined by 
residue set Jj2; substitutions nay affect the nobility 
of residue 27 that is in the principal set. 



35 



Residue 35 is most often Y, W has been observed. 




The side 'jroup of 3S iz buried, but subscifjtion of F 
or W could affect the mobility of residue 34. 
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Residue 47 is always T or S in the sequence .sanple 
used. The Og^j^p^ probably accepts a hydrogen bond froD 
the UH of residue 50 in the alpha helix. Nevertheless, 
there is no ovnrvhc Itning stcric reason to preclude 
other amino acid types at this residue. In particular, 
other amino acids the side groups of which can accept 
hydrogen bonds, viz . U. 0, Q. and E, may be acceptable 
here. 

Residue 50 is often an acidic amino acid, but 
other amino acids are possible. 

Residue 53 is often R, but other amino acids have 
been observed at this residue. Changes of charge may 
affect binding to the /^r.ino acids in interaction set 
if2. 
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Stereo Figur-^ 10 shnvs the residues in set ^2, 
plus R39. From Figure 10, one can see that R39 is on 
the opposite side of DPIT fom the surface defined by 
the residues in set =2. Therefore, variation at 
residue 39 at tho came time as variation of some 
jresidues in sot i2 is much less likely to improve 
binding that occurs along surface S2 than is variation 
of the other residues in set J(2. 



30 In addition to the twelve principal residues and 

13 secondary residues, there are two other residues, 
30(CJ and 33(F), involved in surface nl that we will 
probably not vary, at least not until late in the 
procedure. These residues have their side groups 

35 buried inside OPTI and are conserved. Changing these 
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residues does not chiingc the surface nearly so much as 
does changing residues in the principal set. These 
buried, conserved residues do, however, contribute to 
the surface area of surface i<2. The surface of residue 
set 42 is comparable to the area of the' tryps in-b ind ing 
surface. Principal residues 17, 19, 21, 27, 23, 29, 
31, 32, 34, 48, 49, and 52 have a combined solvent- 



accessible area of 9-;6.9 
15, 16, 18, 20, 22, 24, 2 



Secondary residues 9, IL, 
35, 47, 50, and 53 r.ave 



combined surface of 104 1.7 A^. Residues 30 and 33 have 
exposed surface totaling 33.2 A^. 



Thus the three 



groups' conbined surface is 2026.8 A^. 



Residue 30 is C in BPTI and is conserved in all 
homologous sequences. It should be noted, however, 
that C14/C3B is conser^/ed in all natural sequences, yet 
Harks et al . (MARKa7) sho-'ed that changing both C14 and 
C3B to A, A or T,T yields a functional trypr»in 
inhibitor. Thus it is possible that '3PTI-li>cc 
nolecules will fold if C30 is replaced. 




25 



Residue 33 is F in BPTI and in all homoloqcus 
sequences: Visual inspection of the BPTI structure 
suggests that substitution of Y, M, H, or L might be 
tolerated. 



Having identified twenty residues that define a 
possible binding surface, we must choose some to vary 
first. Given our hypothetical affinity separation 

30 sensitivity, C^^^.^i, we decide to vary six residues 
leaving some margin for errors in the actual base 
composition cf variegated bases. To obtain maximal 
recognition, we choose residues from the principal set 
that arc as far apart as possible. Table 36 shows the 

35 distances between the beta carbons of residues in the 
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principal and peripheral set. R17 and V3-; are at one ^ 
end of the principal surface. Residues A27, C28, L29, t 
A48, and M52 are at the other end, about tventy 

Angstroms avav; of these, we will vary residues 17, 27, 
29, 3-;, and 48. Residues 28, 49, and 52 will be. varied 
at later rounds. 



m 



Of the remaining principal residues, 21 is left to 
later variations. Among residues 19, 31,. and 32, we 
10 arbitrarily pick 19 to vary. 

Uniioited variation o: six residues produces fa.-; x J 1.-:-: 

10^ amino acid sequences. 3y hypothesis, C^ensi ^ f':*:.- 

in 4 X 10^, Table 37 shows the programmed variegation fT, '-.^^ 

15 at the chosen residues. The parental sequence is t^'. 
present as I part in 5.5 x lo'', but the least favored 



sequences are present at only 1 part in 4.2 x iO' . 
Anong single-amino-acid substitutions frcr the PPBD, 
the least favored is F17-r 19-A27-L29-V34-A43 and has a 

20 calculated abundance of 1 part in 1.6 x 10^. Using the 
optinal qty, codon, we can recover the parental sequence 
and all one-anino-acid substitutions to the PPBD if 
actual nt compositions come within 5\ of prograr.r.od 
compositions. The number of trans formants is Mntv ~ 

25 1.0 X 10^ (also by hypothesis), thus we will produce 
itiost of, the programmed sequences. 



The residue numbers of the preceding section are p, 
referred to nature BPTI (R1-P2- . . . -ASB ) . Table 25 has j| 

30 residue numbers referring to the pre-M13CP-DPTI r 
protein; all nature DPTI sequence numbers have been 
increased by the length of the signal s^^quence, i.e. . 
23. Thus in terms of the prc-OSP-PBD resioue numbers, Jf. 
we wish to vary residues 40, 42, 50, 52, 57, and 71. A 

35 DNA subsequence containing ail these codons is found 
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between the f Apa I/Dra Il/ Psg I) sites at base 191 and 
the Sph r site at base 309 of the osp-pb d gene. Among 
Aoa I, Dra I, and Pss I, Apa I is preferred because II 
recognizes six bases vithouc an/ ambiguity. Dra II and 
5 Pss I, on the other hand, recognize six bases with two- 
fold ambiguity at two of the bases. The vgDNA will 
contain more Dra II ar.d Pso I recognition sites at the 
varied locations than it will contain Apa I recognition 
sites. The unwanted extraneous cutting of the vgONA by 

10 Apa I and Spn I will elininate a few sequences from our 
population. This is a ninor problen, but by using the 
more specific enzyme (Ana I), we nininize the unwanted 
effects. The sequence shown in .Table 37 illustrates an 
additional way in which gratuitous restriction sites 

15 can be avoided in some cases. The osp- 1 pbd gene had 
the codon CGC for gSl; because we are varying both 
residue 50 and 52, it is possible to obtain an Apa I 
site. If we change the glycine codon to CGT, the Apa I 
site can no longer arise. . An-i I recognizes the DNA 
20 sequence (GCCCC/C) . 

Each piece of dsDifA. to be synthesized needs six to 
eight bases added at either end to allow cutting with 
restriction e.nzyr.es and is shown in Table 37. The 

25 first synthetic base (before* cutting with Apa I and Sph 
I) is 184 and the last is 322. There are 142 bases to 
be synthesized. The center of the piece to the 
synthesized lies between Q5^ and V57. The overlap can 
not include varied bases, so we choose bases 245 to 255 

30 as the overlap that is 12 bases long. Note that the 
codon for FS6 has been changed to 7TC to increase the 
CC content of the overlap. The ar.ino acids that are 
being varied are narked as ^ with a plus over then. 
Codons 57 and 71 arc synthesized on the sense (bottom) 

35 strand. The design calls for "qfh** in the antisense 
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Strand, so that the sense strand contains (fron 5' to 
3') a) equal part C and A ( i.e. the conplorent of k) , 
b) (0.40 0.22 A, 0.22 C, and 0.16 C) f 1 . e. the 

conplement of f ) , and c) (0,26 T. 0.26 A, 0.30 C, and 
0. 18 C) . - 
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Each residue that is encoded by "qfV:" has 21 
possible outcomes, each of the amino acids plus stop. 
Table 12 gives the distribution of amino acids encoded 
by "qf)c'\ assuming S\ errors. The abundance of the 
parental sequence is the product of the abundances of R 
xIxAxLxVxA. The abundance of the least- 
favored sequence is I in 4.2 x 10^. 

OliqS27 and olig;23 are annealed and extended vith 
Klenow fragment and all four (nt)TPs. Octh the ds 
synti.etic DN'A and ?.F pLC7 D.'.'A ore cut uith both Araa r 
ar.d Soh I. The cut d:*A is purified and the appropriate 
pieces ligated {See Sec. 14.1) and used to transform 
competent Pc:3a3. (Sec. 14.2). In order to generate a 



sufficient nunher of 

Dl. 



t rans f omants , V_ is set to 5000 



1) culture E . ' CO I i in 5.0 1 of LB broth at 27°C 
until ceil density reaches 5 x 10^ to 7 x 10^ 
cells/.-3l. 




1^ 



m 
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2) chill on ice for 65 ninutcs, centrifuge the 
cell suspension at <i000g for 5 cinutes at 4°C, 

3) discard supernatant; resucpend the cells in 
1667 ml of an ice-ccLd, sterile solution of 60 
cirt CaClj* 




35 



4) chill on icj for 15 ninutes, and then 
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centrifuge at 4000^ for 5 minutes at A^C, 

5) discard supernatant; resuspend cells in 2 x 
400 nl of ice-cold, sterile CO mM CaCi2: store 
cells at 4°C for 24 hours, 

6) add DNA in ligation or TE buffer; nix and 
store on ice for :o ainutes; 20 ril of solution 
containing 5 ug/.T»l of DNA is used, 

7) heat shock cells at 42°C for 90 seconds, 

3) add 200 ml LB broth and incubate at 21°C for 
1 hour, 

9) add the culture to 2.0 1 of LB broth 
containing anpicillin at 35-100 ug/rnl and 
culture for 2 hours at 37^c. 

10) centrifuge -at SCOO g for 20 ninutes at A^C, 

11) discard supornacant, resuspend cells in 50 
ml of LB broch ;-ilus onpicillin and incubate 1 
hour at 370c. 

12) plate colls on L'y agar containing 
ampiciilin, 

13) harvest virions by r.ethod of Saiivar et a I . 
(SALI64), 



The heat shock of step (7) can bo done by dividing the 
200 lal into 100 200 ul aliquots in 1.5 ml plastic 
35 Eppendorf tubes. It is possible to cptinire the heat 
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shock for other volunes and kirids cf container. It is 
important to: a) use ail or nearly all the vgDNA 
synthesized in ligation, this will require large 
amounts of pLC7 backbone, b) use all or nearly all the 
ligation mixtum to transform cells, and c) culture all 
or nearly all the trans fortnants at high density. These 
neasures are directed ac maintaining diversity. 

I PTC is added to the gro'^th ncdiu.-n at 2.0 (the 
optimal level) and virions are harvested in the usual 
way (Sec. 14.3). It is ix.portant to collect virions in 
a way that samples all or nearly all the transformflnts. 
Because r" colls arc used in the transformation, 
multiple infections do not pose a problem. 

HH«b has a pi of 7.0 and we carry put 
chromatography at pH 3.0 so that HHMb is slightly 
negative while BPTI and most of its mutants are 
positive. HHMb is fixed (Sec. 15.1) to a 2.0 uii colur.n 
on Affi-Cel lO**^^^) or Affi-Cel isf'^*'') at 4,0 mg/ml 
support r.atrix, the same density that is optimal for a 
column supporting trp. 

We note that charge repulsion between B>TI and 
HilHb should not be a serious problem and does not 
-impose any constraints on icns or solutes allowed as 
eluants. jreither BPTI nor HHMb have special 
requirements that constrain choice of eluants. The 
eluant oc choice is KCl in varying concentrations. 

To remove variants of BPTI with strong, 
indiscriminace binding for any protein or f'^r the 
support matrix {Sec. 15.2), we pass the variegated 
population of virions over a column that supports 
bovine serum albumin (DSA) before loading the 




n 



.■.-•.i,vAj:-<.:::-<l 
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population onco the (HHI-.b) colunn. Affi-Cel 10*'^**^* or 
Affi-OeJ. isl"^*'** is used to ianobiiiie 3SA at the. 
highest level the matrix will support. A 10.0 ml 
column is loaded vith 5.0 nl of Af f i-Ce l-l inked-BSA; 
this colu.-:in. called (BSA), has = 5.0 ml. The 

variegated population of virions containing 10^^ j^p 
1 ml (0.2 X V..,) of 10 .T-M KCl, 1 W1 phosphate, pH 8,C 
buffer is applied to (BSAl. Ke wash (3SA) with 4.5 nl 
(0.9"x Vy) of 50 rv.M KCl, 1 mli phosphate, pH 0.0 buffer. 
The wash with 50 salt will elute virions that adhere 
slightly to 3SA but not virions with strong binding. 
The pooled effluent of the (BSA) column is 5.5 ral of 
approximately 13 ixM KCl. 



15 The colu.nn | HH!<b ) is first blocked by treatment 

with 10^- virions of M13(ar.4 29) in 100 ul of 10 n.^ KCl 
buffered to pH 3.0 with phosphate; the column is washed 
with the sa.^e buffer until OD2fio returns to base line 
or 2 X V'^f have passed through the .colunn, • whichever 

2C cc-aes first. The pooled effluent from (3SA) is added 
to (Hi:.'^b( in 5.5 nl of 13 rJl KCl, 1 rrJ*. phosphate, pH 
3.0 buffer. The colur.n is eluted (5ec.- 15.3] in the 
following way: 



25 



1) 10 r-M KCl buffered to pH 8 . 0 with phosphate, 
until optical density at 2S0nn falls to base line 
or 2 X Vv, whichever is first, (effluent 
discarded). 



30 




2) a gradient of 10 rJi to 2 M KCl in 3 x V^, pH 
held - at 3.0 with phosphate, ( 30 x 100 ul 
f ract ior.s) , 
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3) a gradient of 2 M to 5 M KCl in 3 x Vy, 
phosphate? buffer to pU 3.0 (30 x ICO ul 




10 



15 



20 



25 



-30 



35 



232 



fractions) , 



4) constant 5 M KCl plus 0 to 0.3 M guanidiniua Cl 
in 2 X Vv» '-^ith phosphate buffer to pH 8.0, (20 x 
100 ul fractions) , and 

5) constant 5 K KCl plus 0.8 M quanidiniura Cl in 1 
X \\,, with phosphate buffer to pH G.0, (10 x 100 
ul fractions) . 

m addition to the elution fractions, a sample i. 
removed fron the column and used as an inoculun for 
phage-sensitive Sup" cells (Sec. 15.4). A sample ot 4 
ul froa each fraction is plated on phage-sensitive Sup" 
ceils. Fractions that yield too many colonies to 
count are replated at lover dilution. An approxiaiate 
citre of each fraction is calculated. Starting with 
the last fraction and uorking toward the first fraction 
that was titered, we pool fractions until approximately 
10^ phage. are in the pool, l^e^ about 1 part in moj of 
the phage applied to the column. This population is 
infected into 3 x 10^^ phags-scns it ive rE334 in 300 nl 
of L3 broth. The .ery low multiplicity of infection 
(coi) is chosen to reduce the possibility of multiple 
infection. After thirty minutes, viaole phage have 
entered recipient cells but h.-wfi not yet begun to 
produce new phage. Phage-born genes are expressed at 
this phase, and we can add ar.picillin that will Jcill 
uninfected cells. These cells still carry F-pili and 
will absorb phage helping to prevent multiple 
infections. 

If multiple infection should pose a problem that 
cannot be solved by grovth at low r.u It iple-of -infect ion 
on F* cells, the following procedure car. be employed to 




t -rJ- 'jr. 
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25 
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obviate the problcra. Virions obtair.ed froa the 
affinity separation are infected into F* Ej. col i ^ind 
cultured to anplify the genetic nessages (Sec, 15.5). 
CCC DNA is obtained either by harvesting RF pfU or by 
in vi t ro extension of pritncrs annealed to ss phage DNA. 
The CCC DNA is used to transform F* cells at a high 
ratio of cells to ONA. Individual virions obtained in 
this way should bear only proteins encoded by the OI'A 
within . 

The variegation produced as nany as 6.4 x 10^ 
different amino-acid sequences. <^ett 900.. Thus, 
After two separation cycles, the probability of 
isolating a single SBD is less than 0.10; after three 
cycles, the probability rises above 0.10. 

The phagemid population is grown and 
chroraatographed three ti.-es and then examined for 3BDs 
(Sec. 15.7). In each separation cycle, phage frcn the 
last three fractions that ccntain viable phage are 
pooled with • phage obtained by removing sone of' the 
support natrix as an inoculum. At each cycle, about 
10^^ phage are loaded onto the colunn and about 10^ 
phage are cultured for the nex*t separation cycle. 
After the third separation cycle, 32 colonies are 
picked from the last fraction that contained viable 
phage; phage fron these colonies are denoted SBDl, 
SBD2, . . . , and SBD32 . 
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30 Each of the SBDs is cultured and tested for 

retention on a Pep -Tie colu.-nn support ir.g Kti.^b (Soc- 
15.8). Phage LC7(SUD11) shews the greatest retention 
on the Pep-Tie (HUMb) colu.T.n, e luting at 3 67 r.M KCl 
while wtM13 elutes at 20 r-M KCl. S3011 beco-es the 

35 parental anino-acid scquer.ce to the second variegation 
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cycle . 



The result of. this hypothetical 
shown in Table 38. R<10 changed to 0, 
A50 changed to E, 

W. 



experiment is 
14 2 changed to 
L52 remained L, and A71 changed to 



The next round of variegation (Sec. 16) is 
illustrated in Table 39. The residues to be varied are 
chosen by: a) choosing so.T.e of the residues in the 
principal set that were not varied in the first round 
( vi z . rtiiiidues 42, 44, 51, 54, 55, 7^, or 7S of the 
fusion) , and b) choosing some residues in the secondary 
set. Residues 51, 54 , 55, and 72 are varied through 
all twenty amino acids and, unavoidably, step. Residue 
44 is only varied between Y and F. Some residues in 
the secondary set are varied through a restricted 
ranee; primarily to allow different charges (+, 0, -) 
to appear. Residue 38 is varied through K, R, E, or G. 
Residue 4 1 is varied through I, V, K, or E. Residue 43 
is varied through R, S, G, r/, K, 0, E, T, or A. 

Olig^29 and olig.230 are synthesized, annealed, 
extended and cloned into pLo7 at the Aoa I/ Soh I sites. 
The ligation mixture is used to transform 5 1 of 
<:ompetent PE383 cells so that 10^ trans formants are 
obtained. A new (HKMb) is constructed using the same 
support matrix as was used in round 1. A sample of 
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of the harvested LC7 are applied to {HHMb| and 
affinity separated. The last 10^ phage off the column 
and an inoculum are pooled and ^ turcd. The cultured 
phagcmids are re-chromatographed for three separation 
cycles. Thirty-two clonal isolates (denoted SBOll-1, . 
SDDll-2,..., SBOll-32) are obtained fro.-n the effluent 
of the third seporation cycle -^nd tested for oinding on 
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a Pep-Tie {HI!h:b; colur.n. Of this set, 5BD11.-2j shovs 

the greatest retention on the Pop-Tie (HHMb) colurzn, 
elutinq at 69 2 rJ-I KCl. 



5 The results ot this hypothetical selection is. 

shown in Table 40. Residue 33 fK15 of BPTI) changed to 

E, 4 1 becones V, i2 goes to U , 44 goes to F, 51 goes to 

F, 54 goes to S, 55 goes to A, and 72 goes to Q, 

10 The sbd 1 1 -^3 portion of the osp-obd gene is cloned 

into and expression vector and BPTICEIS, 017, Via, Q19, 
N20, r21, £27, F28, L29, S31, A32, S34, W7 1 , Q72) is 
expressed in the periplasm. ' This protein is isolated 
by standard methods and its binding to Hi-.Mb is tested. 

15 is found to be 4 . 5 x 10""^ M. 

A third round of variation, using SBDll-23 as 
PP3D, is' illustrated in Table 41; eight anino acids are 
varied. Those in the principal set, r'isidues 40, 55, 

20 and 57, arc varied through all tventy anino acids. 
Resid-je 22 is varied through P, Q, T, K, A, or E, 
Residue 34 .is varied through T, P, Q, K. A, cr E. 
Residue 44 is varied through F, L, Y, C, w, or stop. 
Residue 50 is varied through E, K, or Q. Residue 52 is 

25 varied through L, F, I, H, or V. 

The result of this variation is sho-n in Table 42. 
The selected sno is denoted SBnil-23-5 and elutes fron 
a Pep-Tie (HHXbl colunin at 980 nM KCl. The fibdll-r:-5 
30 segment is cloned into an expression vector and 
BPTr(E5. Qll, E15, A17, V13, Q19, ::20, W21, Q27. F23, 
M29, S::, L32. iOl, v;71, Q72) is produced. This tine 
the K.J is 7.3 X lO"'-* :•!. 
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This exarr.plc is hypothetical. It is anticipated 
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Chat more variegaticn cycles -til be needed to -achieve 
dissociation constants of 10*^ M. It is also possible 
that sore than three separation cycles will be needed 
in some variegation cycles. Real DMA chemistry and DNA 
synthesizers may have larger errors than our 
hypothetical S*. If ^cvr ^ 0.C5, then we nay not be 
able to vary six residues at once. Variation' of 5 
residues at once is certainly possible. 
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Table Z: PreTcrred Outer-Surface Proteins 



w»enetic 



Preferred 
Outer-Surface 



10 



15 



20 



25 



Pflckaae protein 


neason for orpfpronr-^ 


H13 coat protein 
(gpviii) 


a) exposed anino terminus, 

b) predictable post- 
translationa 1 • 
processincj, 

c) nu.'nerous copies in 
virion. 


an rrr 


A) fusion data available 


PhiX174 c protein 


a) known to be on virion 
exterior, 

b) snail enough that 
the G-inbd gen<» r-;*n 
r^Diace H aene. 


£i col L LaraB 


a) fusion data available, 
bl non-i5ssenr.ial . 


B_,. subtilic Cntr 
spores 


a) no post-translational 
processing, 

b) distinctive sdequcnce 
that causes protein to 
localize in spore coat, 

c) non-osr.ent i^l . 


Co no 


Sano .Ts for CotC . 
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Tabl^ J: Anibiguous DMA for AA_seq2 
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k 
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1 


V 


1 . 


k 


5 


I 


2 


3 


A 


5 


6 


7 


8 




A.T.C 


A.A.r 


A.A.r 


T.C.n 


T.T.r 


C.T.n 


T.T, r 


A.A.r 










A.C.y 


C.T.n 




C.T.n 
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a 


s 


V 


a 


V 


a 


t 


1 
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13 


14 


15 


16 




C . C . n 


T.C.n 


C.T.n 


G.C.n 


G.T.n 


G.C.n 


A.C.n 


T.T.r 






A.G.y 












C.T.n 
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17 
C.T. r 



P 

25 
C . C . n 



P 
18 
. C. n 



d 

26 
.A.y 



19 
A.T.G 



f 

27 
T.T.y 



1 

2C 
T.T.r 
C.T.n 

c 
28 
T.G.y 



s 

21 
T.C.n 
A.G.y 

1 

29 
T.T.r 
C.T.n 



f 

22 
T.T.y 



30 
G.A. r 



23 
G.C.r 



P 
31 
C.C.n 



r 

24 
C.C.n 
A.C.r 

? 

32 
C.C.n 



y 


t 


g 


P 


c 


k 


a 1 r 


33 


34 


35 


36 


37 


38 


39 1 40 


T.A.y 


A.C.n 


C . G . n 


C.C.n 


T.G.y 


A.A.r 


C. C. n 1 C. C. n 



A.C.r 
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A. T.h 
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43 
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45 




46 
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T.h 
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G.n 
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.A.y 
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.T.y 
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A.y 


A 
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C.n 




a 
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50 




51 




52 




53 




54 




55 




56 
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C.n 


C 


G.n 


T 


T. r 
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G.y 
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A. r 
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C.n 
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T^y 
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T.n 




















0 



10 



IS 



20 



25 



30 



35 



•10 



45 



2 .*. C 

Tabic 3, continued. 



r 

65 
C.C.n 
A.G. r 



n . 
66 
A.A.y 



d I c 
73 74 
C. A.ylT.C.y 



a 

81 
G.C.n 



I ^ 
89 

A.A.r 



a 

62 
C.C. n 



a 

90 
C.C.n 



67 
A.A.y 



m 

75 
A.T.C 



83 
C. A. r 



68 
T. T. y 



r 

76 
C.C.n 



84 
C.C.n 



69 
A. A, r 



t 

77 



S 

70 
T.C. n 
A.C.y 



c 

78 



a 

71 
C.C.n 



c 

72 
G.A. c 



q I g ! 

79 EO 



A . C . n I T . C . >' I C . C . n ; C . C . n 



d 

65 
C.A.y 



91 I 92 I 93 
C.C.n|T,T.y| A.A.y 



d 

86 
C.A.y 



s 

94 
T.C.n 
A.G.y 



37 88 
C.C.niC.C.n 



1 

95 ■ 
T. T. r 
C.T.n 



q 

96 
C.A. r 



I a 
97 
C.C.n 



105 
I T. A. y 



1 

113 
A.T.h 



k 
121 
A.A.r 



■ 98 I 99 
T.C. n| C.C. n 
■A.G.yi 



• a 
106 
C-C. n 



114 
C.T.n 



107 
T.C.C 



115 
C.C.n 



t 
100 
A.C.n 



a 
108 
C.C.n 



c 
101 
G. A. r 



102 f 1.03* 104 ! 
r . A . y I A . T . h 1 C . G . n j 



n I V 
109 110 I 111 
A.T.C |g .T. njcT. n 



112 
G.T.n 



a I t I i 
116 117 118 
C.C.n|A.C.n|A.T.h 



119 12C I 
C . G . n ! A . T . h I 
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127 


123 


T.T. r 


T.T. y ! A. A. r 
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T.T.y 
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Table 4: Table of Restriction Enzymes 

Table of restriction enzymes with lUB codes. 
Suppliers : 

S=Siq:na Chenical Co. 
P.O.Box 14508 
2t. Louis, Mo. 53178 

B=3ethosda Research Laboratories 
P.O.Box 6009 

Ca ithersburg , Maryland, 20877 

M=Boehringer iMannheim BiochenicaLs 
794 1 Castlevay Drive 
Indianapolis, Indiana, 46250 

I=j laternat ionol 3 iochenica Is , Inc. 
P.O.Box 9558 

Haven, Conr.ecticutt , 06535 

tl-tieu England BioLabs 
32 Tozer.Iload 

Beverly, Massachusetts, 01915 

?=Pror.cqa 

2800 S. Fish Hatchery Road 
.'".adison, Wisconsin, 5371 1 

T=Strat3gane Cloning Systems 
11099 North Torrey Pines Road 
La Jolla, California, 92037 



+ before enzyr.c name means that overhang can not be 

sei f -complementary , 
t before enzyme name moans that overhang may or may 
not be sel c -complementary . 
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Table 4, continued. 



25 



30 



35 



40 



45 



50 



Enzyme Reco^gnit. Symn cuts 



Aat II 
t Acc I 



CACCrc 
CTHKAC 



ACC III TCCCCA ■ 
Acv I CRCCYC 
ACl II CTTAAC 
t Af I IIXACRYCT ■ 
10 Aha III TTTAAA 

I CACN'N'NCTC 

Aoa I CCCCCC 

ApaL I GTGCAC 

15 Ase I ATTAAT 

Asp7 18 CCTACC 

Asu II TTCCAA 

t Ava I CYCCRC 

20 Ava III ATCCAT 



Avr II 
Dai I • 
BanH I 
% Bnn I 
Bbe I 
IBby I 
t Pbv II 
BcL I 
^• Bql I 
Bsl TI 
+ Bin I 
» Bsnt I 
BspH I 
tBsoM I 
BSSH II 
4 BstE I 
t 3stX I 
Cfr I 
Cla I 



CCTACG 
TCGCCA 
GGATCC 
CGYRCC 
CGCGCC 
CCACC 
CAACAC 
TGATCA 
CCCrCNNNNGCC 
AGATCT 
GCATC 
CAATCCN 
TCATCA 
ACCTCC 
CCGCGC 

igct:jacc 

CCAr.'NNHKNTCG 

YCCCCR 

ATCCAT 



, P 
P 
P 
P 
P 
P 
P 

P 
R 
P 
P 
P 
P 
P 



5, 
2, 
1, 
2, 
I, 
1. 
3, 



1 . 
2 , 



5, 1 



1. 
3 , 
1, 
1, 
5, 



nP L3 , 17 
nP 3.12 

P 1,5 

■P 




supply 
<S,M, I,N,T 
<B.«. I,N, P,T 
<T 

< Aha II :H . 
<N 

<ncr.e 

<3,T &Dra r 
P 

<r; 

<:M. I.N. P,T 

<N, T 

<N 

<none 

<P,N{EstB r) 
<S? , S,M. I , P, 

Acu I :T 
<T; Nsi I:M,N, 

EOOT2 2 r:T 

<U 

<S, B. I . t;,T 

<S, B. M, I , N, P,T 

<M. I 

< 

<I.W,T 
< 

<S, 3,M, I, :i.T 
<S, 3, I,N, F,T 



» Dra II RGCN'CCY ? 2,5 

+OEa IIICACSNNCTC P 6, 3 

EC047 IIAGCGCT P 3, 3 

+ £coN I CCTNNN'NNACG P 5, 6 

ECOR r CAATTC P 1,5 

EcoR y CATATC P 3,3 

^£SB I CCTNACC P 2, 5 <T 

I Fok I GCATC nP 14.18 <M ,»,T 

G.n II YCCCCC nP I. 5 < 

Hne I ■ KCGCCW P 3, 3 < 

Hae II ROCCCY P 5, 1' <S . B . M , I , N , T 



<S. 3, M, N, T 

<: E'.e I;N,T 
<3, 3, >\,Xl,T: 

SP.-^n 111:1 

<::,T ; ^ cocno? 

<.X. X, T 
<nor.e 
<;:(30on) 
<S, 3..M. 1 
< S . D . M . I 



N , P . T 
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I 
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Table 4 , continued. 





+Hga I 


CACCC 


nP 


10, 


15 


<U 




\HqiA I 


cwccwc 


P 


5, 


1 


<N 


5 


MlQiC I 


GGYRCC 


P 


0, 


6 


< 




iHqiJ riCRCCVC 


P 


5, 


1 


^Ban II:S,M, I, N,T 




Hind II 


GTYRAC 


P 


3, 


3 


<M; &H inc II:S,B,I 














,N,P,T 




Hi nd IIIAAGCTT 


P 


1, 


5 


<S, B,M, I, N, P,T 


10 


Hpn I 


CTTAAC 


P 


3, 


3 


<S, 8,11, I, P ,T 




^Hph I 


CCTCA 


nP 


13, 


1 2 


<N,T 




Eni I 


CCTACC 


P 


5, 


1 


<S, 3,M. I . t/. P.T ; 


















^Mbo II 


CAAGA 


nP 


13 , 


12 


<S, 8, I,N 


15 


Miu r 


ACGCGT 


P 


1, 


5 


<M , N , P . T 




Mst I 


TCCGCA 


P 


3, 


3 


<T ; Fsp I : S , N 




Nao I 


CCCCCC 


P 


3, 


3 


<M,N,T 




Nar I 


GGCCCC 


P 


2, 


4 


<B, N,T 




Kco r 


CCATCG 


P 


1. 


5 


<B,M, tl, P,T 


20 


Ndc I 


CATATG 


P 


2, 


4 


<a, N,T 




N'he I 


CCTAGC 


P 


I, 


5 


<M, N, ?,T 




NoC I 


CCCCCCGC 


P 


2, 


6 


<M,M, P,T 




Nru I 


TCCCCA 


P 


3, 


3 


<B,M,N,T 




Nsd75?4 


RCATGY 


P 


5, 


1 




25 


NSP3 II 


CMC C KG 


P 


3, 


3 


< 




^-Pf IM I 


CCkUUmiUTCC 


P 


7, 


4 


<N* 




^•Ple I 


GAGTa.'NKNN 


nP 


9, 


10 


<U 




?>r.^C I 


CACGTG 


■ P 


3, 


3 


<none 




+ Pl:UM I 


KGGWCCY 


P 


2, 


5 


<N 


30 


+ ?ss I 


RGCriCCY 


P 


5, 


2 


<I 




Psz I 


CTCCAG 


P 


5, 


1 


<S, B,M, I,N, P,T 




Pvu I 


CGATCC 


P 


4 , 


2 


<S, B,r;, B(Xor II) ,M 














, P,T 




Pyu ri 


CAGCTC 


p 


3, 


3 


<S, B,M, I,N, P.T 


35 


+Esr ir 


CCCWCCG 


P 


2, 


5 


<M,T 




Sac I 


CAGCTC 


P 


5, 


1 


<D(Ssr 1) ,M,r,N,P, 




Sac II 


CCGCGG 


P 


4 , 


2 


T 

<B(?st 11} , I,N. P,T 




Sal I 


CTCCAC 


P 


1, 


5 


<B,M, I,N,P,T 


40 


+Sau I 


CCTNAGG 


P 


2, 


5 


<M; Cvn 1:8; Mst II 














:T; nsu3 6 IiN; Aoc 




Sea I 


AGTACT 


P 


3, 


3 


<M, r;, p,T 




*SfaN I 


CCATC 


nP 


10, 14 


<u 




*S€i I 


CCCCtlNriNNCCCC P 


s. 


5 


<u,p,r 


45 


Srr.a I 


CCCCGG 


P 


3, 


3 


<B,M,I,N,P,T 




Sna B I 


TACCTA 


P 


3, 


3 


<M,N,T 




Spe I 


ACTAGT 


p 


1, 


5 


<M,N,T 




£ph I 


CCATC C 


P 


5, 


1 


<B,M, I,N. F,T 




Ssp r 


AATATT 


P 


3 , 


3 


<M,U,T 


50 


Stu I 


AGGCCT 


P 


3, 


3 


<rt,N, KAiic I) ,P,T 




^Stv I 


CCVT./CG 


P 


1, 


5 


<N, P,T 



I:T 



i 




2-U 







Tabic 


4 , 


continued . 




tTag II CACCCA 


nP 


17. 


15 


<none 




%Tflq II 'CACCCA 


nP 


17. 


15 


<none 


c 


+Tthllll CACNNIiCTC 


P 


4 , 


5 


<I,N,T 




tTthlllll CAARCA 


nP • 


16, 


14 


<none 




Xba I TCTACA 


P 


1 , 


5 


<B,M, I,N, P,T 




Xca I CTATAC 


P 


3/ 


3 


<N(soon) 




Xho I CTCCAC 


P 


1. 


5 


< B , M , I , P , T ; Ccr I 


10 










T ; PaeR7. I :N 




Xho 11 RCATCY 


P 


1 , 


5 


<M.T ;Kf BstY 1) 




X£.a I CCCCCC 


P 


1, 


5 


<r.N, P,T 




Xna III CCCCCC 


P 


1, 


5 


<B; I:N; 












EC052 I:T 


15 


Xr.n I G/LAr/NNN'TTC 


P 


5, 


5 


<».MfASD700) ,T 



20 Notes; 

Synun: P for pa 1 inciror.ic , nP for non-pal indronic 

cuti>: first nunber indicates position of cut in 

25 top strand, 1 r.cans after first base of 

recognition; second nuaber indicates 
position of cut in lower strand, counting 
ief t-to-right . 



HODMS 



0 
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Table 5: Potential sites in ipbd gene. 



J 



i 



15 



25 



35 



AO 



45 



Sutnmdry of cuts. 

= % Acc I has 3 elective sites 
= Af I ir has 1 elective sites 
Ana I has 2 elective sites : 
- Apu II has 1 elective sites 
= Ava III hcs 1 elective sites 

1 elective sites 

2 elective sites 
I elective sites 

3 elective sites 
2 elective sites 

+ F.SD I r.-is 2 elective sites 
Hind III has 6 elective sites 



Bsp.^ II has 
•BzzH II has 
t BstX I has 
+Dra II has 
+EcoM I has 



Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 

Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 



Enz - t Stv I has 6 elective sites 



has 
has 



Kon I 

Mlu I 

Nar I has 

Nco I has 

r .'he I 

Urru I 



has 
has 



1 elective sites : 

1 elective sites : 

2 elective sites : 
1 elective sites : 

3 electiva sites : 

2 elective sites : 

+PflM I has 1 elective sites 
PnaC I has 1 elective sites 
+ ppuri I has 2 elective sites 
+ R^r II has 1 elective sites 
+S_f_i I hjs 2 elective sites 
See I has 3 elective sites : 
Soh I has 1 elective sites : 

to I has 5 elective sites : 



96 169 281 
19 

102- 103 

381 
: 314 • 
: 7 2 

: 67 115 
: 323 

: 102 103 226 
: 62 94 
57 187 
: 9 23 60 
287 361 386 
48 
3 14 

238 343 
323 ■ 

388 



25 289 
38 65 
: 94 
; -223 
: 102 226 
: 1C2 

24 261 
12 45 379 
221 

23 70 150 
237 386 
11 4 4- 
143 263 323 3S3 



Enz 
Enz 
Enz 
' Enz 



Xba I has 1 
Xca I has 2 
I has 1 



elective sites 
elective sites 
elective sites 



Xho 

= Xna III has 3 elective sites 



84 

96 169 
85 

: 70 209 
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Enzymes not cutting i rb.1 . 



Avr II 
EcoR I 
Sac I 
Xna I 



BanH I 
EcoR V 
Sal I 



Bcl I 
Ho A I 
S.^u I 



F^stE II 
Hot I 
Sr.g I 



m 



II 
m 

m 

mi: 

m. 
m 

m 




t^,^ • i.*nV • ^ i i -h * " i^ r . -r » W . Virifai'.-: li^ tt ra-. i — • -t-' ri -^i ■ 




c 



15 



20 



2-16 



Table 6: Exposure of amino acid types in T4 Izin & HEWL. 

HEADER HYDROLASE (O-GLYCOSYL) 18-AUG-86 2LZH 
COMPND LYSOZYME (E.C.3.2.1.17) 
AUTHOR L.H. WEAVER, B.W.MATTHEWS 

Coordinates from Brookhavcn Protein Data Bank: ILYH. 
Only Molecule A was considered, 

HEADER HYDROLASE (0-CLYCGSYL) 29-JUL-82 ILYM 

COMPND LYSOZYME (E.C.3.2.1.17) 

AUTHOR J . HOCLE , S . T . BAO , M . SUHDARALINCAM . 

Solvent radius = 1.40 Atonic radii in Table 7. 

- t 
Surface area measurea in Angstroms-. 



Type 



Max 



Sigma 



exposed ( fraction! 




c. 



10 



15 



20 



2^- 



Table 7; Atomic radii 
Angstroms 



Ocarbonyl 1-52 
Other atorr.s 1.80 



Table 8 

Fraction of DNA molecules having 
n non-parental bases when 
reagents that have fraction 
M of parental nt. 



H 


.9965 


.97716 


.9?fiU 




.70433 


. 63096 


fO 


.9000 


.5000 


. lOOO 


.0100 


. 0010 


. 000001 


f 1 


.09499 


.35061 


,2393 


, 04977 




.0000175 


f2 


. 00485 


. 1183 


. 2763 


. 1197 


.0292 


. 000149 


f3 


.00016 


.0259 


. 206 1 


. 1354 


. 0705 


. 000312 


f4 . 


OC0004 


.00409 


.1110 


.2077 


. 1232 


.003207 


fS 


0. 


2X10*"^ 


.00096 


.0336 


,1182 


, 030155 


fl6 


0. 


0. 


0. 


5x10"'' 


. 00006 


.027231 


f23 


0. 


0. 


0. 


0. 


0. 


.00000S9 


nost 


0 


0 




5 


7 


12 



"most" is the value of n having the highest 
probability. 



m 

IS 
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Table 9: best vgcodon 

S Program "Find Optimum vgCodon," 

IKITIALIZE-MEMORY-OF-ABUHDAflCLS 
DO ( tl = 0.21 to 0.31 in steps of 0.01 ) 
. DO ( cl « 0.13 to 0.23 in steps of 0.01 ) 
. . DO ( al. =« 0.23 to 0.33 in steps of .0.01 ) 
10 Comnent calculate gl fron other concentrations 

. .. . gl l.O - tl - cl - al 
. . . .IF( gl .ge. 0. 15 ) 

. . . . DO ( a2 « 0.37 to 0.50 in steps of 6.01 ) 

DO ( C2 » 0.12 to 0.20 in steps of 0.01 ) 

15 Comment Force D+E = R + K 

g2 (gl*a2 -. 5*al*a2)/ (cl + 0 . 5*al) 

Comment Calc "? from other concentrations. 

t_ 1. - a2 - c2 - g2 

IF(g2.gt. d.l.and. t2.gt,0.1) 

2 0 CALCULATE-ABUNDAI/CES 

........ COMPARE-ABUND/U-'CES-TO- PREVIOUS-ONES 

end__IF_block 

end_DO_loop ! c2 

. .* - . . . end_DO_loop ! a2 
2 5 end_Ir_block ! if gl big enough 

. . . . end_DO_loop ! al 

.. . . end_DO_loop ! cl 

. .end_DO_loop i tl 

WRITE the best distribution and the abundances. 



I- 

m 



i1 

mi 

m 




•V. 



10 



15 



20 



25 



30 
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Table 10: Abundances obtained 
from optinun vgCodon 



Amino 
acid 


Abundance 


A 


4 


801 


D 


6 


oot 


F 


2 


661 


H 


3 


601 


K 


5 


20\ 


M 


2 


36^ 


P 


2 


&8\ 


R 


6 


3 2*« 


T 


4 


16; 


W 


2. 


36^ Ifaa 


StOD 


5. 


20^ 



Anino 
acid 



Abundance 



C 
E 
G 
I 
L 
U 
Q 

S_ 



.66% 
. COX 
. 601 
.861 
. 821 
5.20^ 
3.60% 

7.02% ni f aa 



6. 60't 
5.20% 



ratio = Abun(W)/Abun(S) = 0.4074 



i 


f 1/ratio) 3 


fratiol^ 


stop-firee 


1 


2,454 


.4074 


.9480 


2 


6.025 


.1660 


.8987 


3 


14 .733 


.0676 


. 3520 


4 


36.298 


.0275 


.8077 


5 


89 .095 


.0112 


.7657 


6 


213.7 


4.57 X 10"^ 


.7253 


7 


536.3 


1.36 X 10*3 


.6881 



m 
m 



I 

i 



m 

m 
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Table 11: Calculate vorst codon. 

Program "Find worst vgCodon within Serr of given 
5 distribution. " 

INITIALIZE-MEMORy-OF-ABUHOANCES 
Coraraent Serr is \ error level. 
READ Serr 

Comraent Tli, Cli , AI i , CI i , T2 i , C2 i , A2 i , C2i , T3L,C3i 
10 Comment arc the intended nt-ti istr ibution. 

READ Tli, Cli, All, Cli 

READ T2i, C2i, A2 i , C2 i 

READ T3i, G3i 

Fdwn = l.-Serr 
15 Fup = l.+Serr 

00 ( tl = Tli*Fdwn to Tli*Fup in 7 steps) 

. DO ( cl « Cli*Fdwn to Cli*rup in 7 steps) 

. . DO ( si « Ali*Fdwn to Ali«Fup in 7 steps) 

. . 1 gl « 1. - tl - cl - al 
20 . . IF( (gl-Cli)/Cli .It. -Serr) 

Comment gl too far below Cli, puch it back 

. . . . gl = Cli*Fdwn 

.... factor = {l.-gl)/(tl + cl + al) 
. . , , tl « tl*factor 
25 . . . . cl = cl*factor 

. . . . al « al*factor 

end_IF_blocy: 

. . . IF{ (gl-Cli)/Cli .gt. Serr) 
Comment gl too far above Cli, push it bacJc 
30 . . . . gl = Gli'Fup 

.... factor « {l.-gl)/(tl + cl + al) 
. . . . tl = tl*factor 
. . . . cl = cl* factor 
. . ... al = al*factor 
3 5 . . ; . .end__IF_bloclc 

. . . DO ( a2 A2i*Fdvn to A2i«Fup in 7 steps) 
. . . - DO ( c2 = C2i*Fdvn to C2i*Fup in 7 steps) 

DO (g2=C2i*Fdvn to C2i*Fup in 7 steps) 

-Comnent Calc t2 from other concentrations. 

^0 t2 « 1. - a2 - c2 - g2 

...... IF( (t2-T2i)/T2i .It. -Serr) 

Comment t2 too far below T2 i , push it bade 

t2 T2i*Fdwn 

factor « (l.-t2)/(a2 + c2 + g2) 

a2 = a2*factor 

. . c2 = c2*factor 

g2 = g2*factor 

end_IF_block 

IF( (t2-f2i)/T2i .gt. Serr) 

50. Com.'nent t2 too far above 72 i, push it back 
t2 = T2i*Fup 

factor « (l.-t2)/(a2 + c2 ^ g?) 




m 

Is 

i 



i 





■ ._--:-:-^-^Vij--^r^ ■ 



15 



20 



25 
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Table 11, continued. 

a2 a2 • factor 

c2 = c2*factor 

. g2- = q2 *f actor 

.end_IF__biock 

IF(g2.gt7 0.0 .and. t2,gt.0.0) 

t3 = 0. 5* (1. -Serr) 

. . . . . . g3 => 1. - t3 

CA LCU LATE- A BUri DANCES 

........ COMPARE-ABUIIDANCES-TO-PREVIOUS-ONES 

t3 = 0.5 
. . . ' . . . . g3 « 1. - t3 

CALCULATE-ABUriOAN'CES 

COr-:PARE-ABL'tlOAt:CES-TO-PREVIOUS-OH£3 

t3 = 0. 5* ( 1 . +Serr) 

g3 « 1. - t3 

CALCULATE-ABUNDANCES 

COMPARE-ABUriDANCES -TO-PREVIOUS -ONES 

, end_IF__blocIc 

end_DO_loop ! g2 

er.d_DO_loop ! c2 

end_DO_loop ! a2 

. '. . . end_DO_loop 1 al 
. . . end_DO_loop ! cl 
. .end_DO_loop ! tl 

WRITE the WORST distribution and the abundances. 



-A1 





Table 13: DPTI Homoloques 




1j 



0 



155 



Table 13, continued. 
R if 20 21 23 24 25 26 27 23 29 30 31 32 33 

-3 - - . : - ^ 

-2 2-L ZRK---RR-ET 
-1 P- QDON---QK-RT 

1 RRH HRRIKTRRRCD 

2 RPRPPPNEVHHPFL 

3 KYTKKTCDA.RPDLP 
^ LAFFFFDSAODFDI 

5 c-ccccccccccccc 

6 lEKYYNEQt/DDLTE- 

7 LLLLLLLLLKKESQ 
3 HrpPPLPCPPPPPA 
9 RVA AAPKYVPPPpfG 

10 NAE DDEVSIDD YVD 

11 PA PPPTVARKTTTA 

12 CCCGGCCCCCKGCC 

13 R P P R R R P P P N* I p p L 
1"* CCCCCCCCCCCCCC 
15 YHKKLNRMR--KR F 
1^ DFAAAAAGACQAAG 

17 KFSHYLRMFPTKGY 

18 IIIIMIFTIVVMFM 
15 PSPPPPPSQRRIKK 

20 AAARRARRLAARRL 

21 FFFFFFYYWFFYYY 

22 VYYYYYY FAYYFNS 

23 VVYYYYYYFYYYYY 
2 N S K D N f< N D D K ^^ M N 

25 QkwsPSSCA T PATQ 

26 KCA AAHSTVRSKRE 
2"^ KAASGLSSKLAATT 

28 KNKNNHKMGKKCKK 

29 QKKKKKRAKTRFQN 

30 cc .cc cccccccccc 

31 E. YQNEQEEVKVEEE 

32 RPLKKKKTLAQTPC" 

33 FFFFFFFFFFFF FF 
3- DTHIINIQPQRVKI 
35 WYYYYYYYYYYYYY 

S5CGGCGGGRGGGG 

^'^ CCCGGGCCCCGGGC 

38 CCCCCCCCCCCCCC 

-5 GRKPRCCMQDDKKQ 

^0 CCCGCCGGCCGACG 

-1 N N N W N N N N D D K N N 

"^2 SAAAAAACCHHSGD 

■^3 N N N- M K N N N N C C i; N N 





'A 
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Tabic 13, continued. 



R I 

44 
45 
46 
47 
4S 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 



20 
R 
F 
K 
T 
I 
E 
E 
C 
R 
R 
T 
C 
T 
C 



21 
R 
F 
K 
T 

r 

E 
E 
C 
R 
R 
T 
C 
V 
V 



23 24 
N M 



25 26 27 
N N N 
F 
M 
T 
L 
E 
E 
C 
E. 
K 
V 
C 
G 
V 
R 
S 



F 

K 
T 

r 

D 
E 

c 
Q 
R 
T 
C 
A 
A 
K 
Y 
C 



28 29 30 
K H N 
F F 
Y K 
S T 
E 
T 
L 
C 
R 
C 
E 
C 
L 
V 
Y 
P 



E 
K 
E 
C 
R 
E 
Y 
C 
C 
V 

? 

G 
0 



31 32 33 
N K R 
F Y 
R K 
S S 
A E 
Q 
*D 
C 
L 
D 
A 
C 
S 

c 



F 

s 

T 
L 
A 



0 
K 
C 
I 
N 



20 Dendroasois angusticGPS (Eastern Green Maniba) 

013 S2 C3 toxin (DUFTa5) 
2 1 Dendroasois pclvlepis polvlcpes "(Black manba) B toxin 
(DUFTS5) 

2 2 D ond r o.TSpi s polvleois colvlopes (Black Hamba) E toxin 
(UUFTS5) 

2 3 Vicara ar.modvtcs TI toxin (DUFT8 5) 

2 4 Vjpera gr^norivtos CTI toxin (DUFTSS) 

25 nuncrarus f^sciatus VTII B toxin (DUFT35J 

2 6 Aner.onia sulcata (sea aner.onc) 5 II (L^UFTSS) 

27 Ko:no s:ipicns HI-14 "inactive" dor.ain (DUFT85) 

28 Hor.o sjpions Hl-l-t "active" domain (DL;F785) 

29 beta bungarotoxin SI (DUFTS5) 

30 beta bungarotoxin D2 (DUFT85) 

31 Bovine spleen TI II (FI0RS5) 

32 T.Tchvpleus tr iriont.i tus (Korseshoe crab) heniocyte 
inhibitor (NAKA37) 

33 Bor.bvx nori (silkworn) SCI-III (SASAo4) 



Notes 
a) 
b) 



c) 
cJ) 
e) 



both bota bungarotoxins have residue 15 deleted. 
S. r.ori has an extra residue between C5 and C14; ve 
have assigned F and C to residue 9. 

all natural proteins have C at 5, 14, 30, 3S, 50, & 55. 
all homologucG have F33 and C37. 

extra C's in bungarotoxins fora interchain cystine 
bridges 



f '.V. 



Si 



f * " 



r- 




Res. 
-5 
-< 
-3 
-2 
- I 

1 

2 



5 
6 
7 
8 
9 
10 
11 
12 
13 
lA 
15 
16 
17 
13 
19 
20 
21 
22 
23 
24 
25 
26 
27 
23 ■ 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
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Table 15: Aoino acids cbscrved at each Residue 
BPTI honologues 



Oif f ernet 
AAs 
2 
2 
5 
10 
10 
10 
9 
10 
7 
1 
10 
■ 5 
7 
9 
10 
10 
2 
5 
3 
12 
7 
12 
6 
7 
5 
4 
6 
.2 
4 
10 
9 
5 
7 
10 
1 
7 
11 
1 
11 



Contents 
D -32 
E -32 

T F F Z -29 

Z3 R3 Q2 T2 H C L K E -18 
DC T2 P2 Q2 E C N K R -13 
R2 1 A2 K2 H2 P L r T C D 



V F L 



A L 



F2 0 R4 A2 H2 U E 
D15 K6 T3 R2 P2 S Y C 
F13 D4 L3 Y2 12 A2 S 
C33 

Lll KS tlA K3 02 12 Y2 D2 T 

LIS Ell K2 S Q 

P26 H2 A2 I L G F 

P17 A6 V3 R2 Q L K Y F 

Yll £7 D4 A2 112 R2 V2 S I 

T17 P5 A3 R2 I S Q Y V K 

C22 K 

P22 R6 L3 N I 
C31 T A 

K15 R4 Y2 M2 L2 -2 V G A 
A22 05 Q2 R K 0 F 
R12 K5 A2 Y3 H2 S2 F2 L M 
121 M4 F3 L2 V2 T 
111 P'lO R6 S2 K2 L Q 
RIO A7 S4 L2 Q 
Via ri3 W I 
F14 V1-; H2 A U S 
Y32 F 

I/2€ K3 D3 S 

A12 S5 Q3 P3 W2 L2 T2 K C 

K16 A6 T2 E2 32 R2 C H V 

A18 S3 K3 L2 T2 

013 KIO !/5 Q2 R H M 

LO Q7 K7 A2 F2 .R2 M C T H 

C33 

Q12 Ell LA K2 V2 Y N 
T12 PS K.l Q3 E2 L2 
F33 

Vll 18 T3 D2 tt2 Q2 
Y31 W2 
027 S5 R 
C33 

C31 T A 

R13 09 K4 Q3 02 P M 



I U F 



TOP 



0 V S R A 
F H P R K 



BPTI 



R 
P 
D 
F 
C 
L 
E 
P 
P 
Y 
T 
C 
P 

c 

K 
A 
R 

T 
I 

R 
V 
F 
Y 
M 
A 
K 
A 
C 
L 
C 
Q 
T 
F 
V 
Y 
C 
C 



e-vv*;.*...- . 




A 
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Table 16: Exposure in BPTI 
Coordinates taken fron 

Brookhaven Protein Data Bank entry 6PTI. 

HEADER PROTEINASE INHIBITOR (TRYPSIN) 13-KAY-e7 
COMPND BOVINE PANCREATIC TRYPSIN ItiHIBITOR 

COMPND. 2 (/DPTI$, CRYSTAL FORM /HIS) 
AUTHOR A.WLOOAWER 

Solvent radius = 1,40 
Atomic radii given in Table 7 

Areas in Angstroms-squared. 

Not Not 
Total Covered covered 
Resioue area by M/C traction at all traction 



6PTI 




■ 2C1 



Table IG, continued. 



PHE 


3 3 


304 


.27 


59 


.79 


0 


. 1965 


18 


.91 


0 


. 0622 


VAL 


3 4 


251 


.56 


109 


. 73 


0 


. 4 364 


42 


36 


0 


. 1684 


TYR 


3 5 


332 


. 64 


80 


. 52 


0 


.2421 


15.05 


0 


. 0452 


CLY 


36 


187 


.06 


11 


. 90 


0 


. 0636 


1 


97 


0 


.0105 


CLY 


3 7 


185 


.28 


' 84 


, 26 


0 


.4548 


39 


17 


0 


.2114 


CYS 


38 


234 


.56 


73 


.64 


0 


3139 


26 


40 


0 


1125 


AJIC 


39 


417 


. 13 


304 


.62 


0 


7303 


250 


73 


0 


6011 


MJ< 


40 


209 


.53 


94 


.01 


0 


4487 


52 


95 


0 


2527 


LYS 


4 1 


314 


.60 


166 


.23 


0 


5234 


108 


77 


0 


3457 


ARC 


42 


349 


06 


232 


.83 


0 


6670 


179 


59 


0 


5145 


ASN 


43 


256 


47 


38 


.53 


0 


1446 


5 


32 


0 


0200 


ASM 


A A 


269 


65 


91 


.03 


0 


3373 


23 . 


39 


0 


0367 


PHE 


45 


313 ,22 


69 


73 


0 


2226 


14 . 


79 


0 


04 72 


LYS 


46 


309 


83 


217 


18 


0 


7010 


155. 


73 


0 


5026 


SER 


47 


224 


78 


69 


11 


0. 


3075 


24 . 


80 


0 


1103 


ALA 


48 


211 


01 


82 


06 


0. 


3889 


31. 


07 


0, 


1473 


CLU 


49 


286. 


62 


161 


00 


0. 


5617 


100. 


01 


0 . 


3439 


ASP 


50 


299 . 


53 


156 


42 


0. 


5222 


95. 


96 


0 . 


3204 


CYS 


51 


238 . 


63 


24 


51 


0. 


1027 


0. 


00 


0 . 


ocoo 


MET 


52 


203 . 


05 


89. 


43 


0. 


3054 


66, 


70 


0. 


2276 


ARC 


53 


356. 


20 


224 . 


61 


0 . 


6306 


139. 


75 


0 . 


5327 


THR 


54 


251. 


53 


116, 


43 


0. 


4629 


51. 


64 


0 . 


2053 


CYS 


55 


240. 


40 


69 . 


95 


0. 


2910 


0. 


00 


0 . 


0000 


CLY 


56 


184 . 


66 


60. 


79 


0. 


3292 


32. 


78 


0 . 


1775 


CLY 


57 


106. 


5S 


49 . 


71 


0. 


4664 


38. 


2S 


0. 


3592 


.ALA 


50 


no position 


givt 


in 


ir. Protein 


Data 


Bank 



"Total area" 



is the area neasured by a rolling sphere 
of radius 1.4 A, where only the ato,-::s 
within the residue are considered. This 
takes account of conf orr.ation. 



"Not covered 
by M/C" 



"Not covered 
at all" 



is the area r.casured by a rolling sphere 
of radius 1.4 A where all -r.ain-chain atoms 
are considered, fraction is the exposed 
area divided by the total area. Surface 
buried by n.'^in-chain ator:s is nore 
definitely covered th^-^n is surface covered 
by side group atoms. 

is the area neasurcd by a rolling sphere 
of radius 1.4 A where all atocs of the 
protei n are cons idercd . 



Table 17: PlaGaids used in Dccailcd Exanp-'c 



Phage 
LCI 

pLC2 

pLC3 
pLC4 



pLG5 
pLG6 
pLC7 

pLca 

pLC9 
pLGiO ■ 
dLCII' 



fponter.ts 

Hl3npl8 with II/Aoi II/Aco 

ll/^au I adaptor 

LCI with azi}^' -and ColEl of.pBR:22 cloned 

into Ant I sites 

pIjC2 with /vcc I site removed 

pLC3 with firr.t part of o gp-pbd qcne 

cloned into Hsr Il/Sau I sites, 

Avr Il/Asu ir sites created 

pLG4 with second part of 05£z^d qcne 

cloned into fi^ II/Asu II sites, I 

site created 

pLGS with third part of psp-pbd qcne 
cloned into Asu II/B^sH I sites. Qte I 
site created 

pLG6 vith last part of or.p-abd qone 
cloned into Hh:? I/Asu.H sites 
pLC7 vith dis^ible-d psp-phd qene, sa-.e 
lenqth 

pLC7 nucatcd to display DPTI (ViSfipTi ) 
pLCa ^ tet ^ r;cnc - ari2'' gene 



prjG9 



ne - -inn** gone 



c 



4 



T 

r 

''A 
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Tabic 13: Enzy-e sites elininatod when 
M13r.pl3 is cue by Ava II 
and Br.u3 6 I 



10 



15 



20 



25 



30 



35 



40 



Aha II 
FSP I 
EcoR I 
Sna I 
Hind III 
Hind II 



Aat II 
■ Bbv II 
PstB I 
Eco57 I 
Esq I 
Nhe I 
Pf IM I 
Rsr I 
Spe I 
'xca I 



Kar I 
Eg I I 
Sac I 
BanH I 
ACC I 



Cdi II 
Kg IE II 
Kpn I 
XbJ I 
Pilt I 



r^/u I 

OSU36 X 



Table 19: En2ynes not cutting 
M13mpi3 



. All I 
Bcl I 
BstE II 
EcofI I 
Hpa .1 
N'ot I 
PmaC I 
Sac I 
Stu I 
Xho I 



Apa I 

dspm r 

BstX I 
ECOO109 I 
Mlu I 
^:r^ I 
Ppa I 
Sea I 
Sty I 



E^g r 

Mco I 

P::'jM I 
SIl I 

Ttrmi I 




* ' * t 





'if ■ ll 



■/ 

t 
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Table 20: Enzynos cutting 
Ano R gene and ori 



10 



15 



Aat II 
Sea I 
Pvu I 
Hind II 
Nde I 



fibv II 
Ithill I 
I 

Pst I 



PC?? 7 I 

Aha II 
Rgl I 
Xbft I 



Ppa I 
Cdl XI 
HqiE II 
Af I III 



i 



1^ 




r 



C 





:g7 

Table 22: ipbr! gene 

pbd modlO 29IIIS8 : 

lacUVS Rsr II/ Avr II/genc/XnaA attP,nuator/Msc 11; ! 
5'- CGCaCCC TaT ! Rsi; II sice 

CCAGGC tttaca CTTTATGCTTCCCCCTCG tataat GTG ' i laclA'S 



TGG aATTGTGACCCCATAACAATT 
CCT AGGAqq CtcaCT 

atg aag aaa tct ccg get ctt. aaq get age 
gtt get gtc gcg acc ctg gta ccg atg ctg 
tct ttt get cgt ecg gat' ttc tgt etc' gag 
ccg.cca tat aet ggg cee tge aaa geg cgc 
ate ate cgt tat ttc tae aac get aaa gca 
ggc etg tge cag aee ttt gta tae ggt ggt 
tge cgt get aag cgt aac aac ttt aaa teg 
gee gaa gat tge atg cgt ace tge ggt gge 
gee get gaa ggt gat gat ccg gee aaa gcg 
gee ttt aac tct ctg caa get tct get ace 
gaa tat ate ggt tae gcg tgg gee atg gtg 
gtg gtt ate gtt ggt get aee ate ggt ate 
aaa ctg ttt aag aaa ttt act teg aaa gcg 
tct taa tag tga gnttacc 1 
agtcta agcecgc ctaatga gcgggct tttttttt 
CCTgAGG -3 ' - i Mst 11 



lacQ opersator 
Sh ine-Oa Igarno seq. 
10, M13 leader 
20 
30 
40 
50 
60 
70 
SO 
90 
100 
110 
120 
130 

teminator 




• i: 



1^ 



J 

1 



9 



O 



Table 23: iobd ONA sequence 

DNA Sequence file « UV5_M13PTIM13 . DNA; 17 
DfJA Sequence title « 
pbd modlO 29III83 : lac-LiVS Rs r I r/Avr I I/qene/TrpA 

a tccnua tor/Mstll ; ! 

1 C I CCA I CCC I TAT | CCA | GCC | TTT I ACA | CTT | TAT | GCT ( TCC | GCC | TCC [ 
41 TAT| AAT|CTC|TCC| AAT|TCT|CAG(CCC| ATA|ACAl ATT|CCT|ACC| ACCi 
83 CTC| ACT| ATC|AAC|AAA|TCT|CTClCTT|CTT| AAC|GCT|ACC!GTT|CCT! 
12 5 CTC|CCG|ACC|CTC|ctAlCCClATC(CTCiTCT|TTT|CCT(CGT|CCClCAT! 
167 TTC|TGTlCTC|CAG|CCC|CCA|TAT| ACT|CGC(CCC[TGClAAA|CC3|CCCi 
209 ATC I ATC ( CCT ( TAT | TTC | TAC \ >J.C | CCT | AAA | CCA I GCC | CTC | TCC | CAG j 
251 ACC|TTT.|CTA|TAC|CCT|GGT|TGCiCCT|GCTiAAG|CGT|AAC|AAC|TTTi 
203 AAA I TCC | GCC | C AA ] G AT | TC C ( ATC I CCT | ACC | TCC | GCT [ GCC [ GCC J CCT ! 
3 35 CAA|GCTiGAT|CAT|CCCiCCC!AA/\iGCC|GCC|TTT| AAC|TCT!CTC[CAA| 

3 7 7 GCT J TCT I CCT I ACC I C/vA | TAT { ATC | GCT | TAC | CCC 1 TCC | GCC \ ATG I CTC \ 

4 19 CTC|GTT| ATC|CTT|GGT|GCT|ACC[ATC|CGT| ATCiAAAlCTGiTTT|AAG| 
4 61 AAA(TTT|ACT|TCGiAAA|GCGlTCT[T.\A|TAC|TGA|CGT|TAC|CAC|TCT! 
503 AAC|CCC|CCC|TAA|TGAlCCClGCClTTT|TTT|TTT|CCT[GAG|G 

Total « 539 bases 



1' 



p 

m 




-1^ 



■5 



0 



Table 24 
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Surr..7.ary of Restriction Cuts 



tAcc I 
Acc III 
ACV I 
Afl II 
\ M\ II 
Aha III 
Apa I 
AS0713 



I obsciT/ed sites : 
1 observed sites : 
L observed sites : 3 
1 obsen/ed sites : 
1 observed sites 
1 observed sites ; 
L observed sites : 1 
I observed sites : 
1 observed sites : 
1 observed sites : 
1 cbsen-'ed sites : 
3 observed sites : 

1 observed sites : 3 
I observed sites : 

I observed sites : 
1 obser-w'ed sites : 
1 obser'/ed sites : 
1 observed sites 
1 obser-^ed sites : 

2 observed sites : 2 

1 observed sites : 
1 obscr'-'ed sites : 

1 observed sites : 

2 obser'/ed sites : 

L obser'v'ed sites : 2 
1 obser/ed sites : 
I obser.'ed sites : 
.3 obser'/ed sites : 
1 observed sites 
I observed sites 
1 observed sites : 
L observed sites : 1 

2 obser'/ed sites : 
L observed sites : 4 
I cbser-zed sites : 3 
L obscr/ed sites : 4 
L observed sites : 1 
L obser/ed sites : 1 
s 1 obser'/ed sites 

1 observed sites : 
1- obscr\'ed sites : 
1 observed sites : 

1 obser'/ed sites : 
1 observed sites : 

2 observed sites : 

1 observed sites : 

I observed sites : 3 
L observed sites : 2 

2 observed sites : 



259 

162 
23 
109 
: 404 

292 
93 . 
133 
471 
175 

76 

138 328 540 

28 

352 

346 

3 19 

205 
: 493 

413 
99 350 

193 
277 
213 
299 
40 
323 
473 



350 



304 



i:S 323 
: 10 3 
: 377 
340 
38 

93 

04 
28 
13 
15 
23 
: 311 
332 
184 
193 

3 

535 

144 
351 
11 
40 
76 413 



540 



209 



A 




I' W.^'«'f'f-'','ri r',' -, 
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Table 25: Annotated Sequence of i pb-i gene 

5 ' - C I CCA I CCC I TAT | CCA | CCC I TTT \ ACA I CTT ( TAT I 
I Rsr II I I -35 ! 



I CCT I TCC I CCC I TCC 1 TAT | AAT | CTG I TCC 1 
I "10 L 



52 



/\AT I TCT I CAG j CCC | ATA | ACA | ATT I 
\i\Q Operator [ 



73 



I CCT I ACC I ACG I CTC | ACT ] 
I Avr III 



88 



1 
ATG 



k 
2 

AAG 



K S 
3 4 
AAA I TCT 



1 


V 


1 


)c 


a 


s 


5 


6 


7 


3 


9 


10 


CTC 


CTT 




A.',C 


CCT 


ACC 






Afi ri 


r/^.e I ' 



118 



V 

11 



a 
12 



I CCT 



V I a 
13 I 14 
CTC CCC 



t 
15 
ACC 



1 

16 
CTG 



V p 

17 I 12 i I 
CTaIcCcI ATCtCTC! 



19| 20| 



1 Kun \\ 



148 



c 



- -J 



21 
TCT 



! p 

31 
CCG 



t 

22 
TTT 



P 
32 
CCA 



a 

23 

ccr 



y 

33 
TAT 



r 
24 
CCT 



P 

25 
CCG 



l AccI Ul 



d I t 
26] 27 
GATtTTC|TCT 



I 




29 




CTC 


gag! 


AvA r 1 


Xho I ! 



PflM I 



t g 

34 35 
ACT I CCC 



P 

36 



3 7 



a 

39 



CCC I TCC I AAA CO 



r I 

401 

ccc; 



I Apci T ( 

Dr a 11 t 
Pss I i 



i 


1 


r 


y 


t 


y 


n 


a 


4 1 


42 


43 


44 


45 


46 


47 


48 


ATC 


ATC 


CCT 


TAT 


rrc 


TAG 


A.\C 


CCT 



49| 

Aj\AI 



178 



203 



235 





/ 



Q 
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a 

50 
CCA 



c 
61 
TCC 



s 
TCC 



q I I I c 

51 52 53 

ccc|ctc|tcc 

u 



61 I 64 
GCT I AAC 



r 

62 
CGT 



Table 


25. 


con 




q 


t 


f 


V 


y 


q q 


54 


55 


36 


57 


53 


59| 60 


CAC 


ACC 


TTT 


CTA 


TAC 


CCT 1 CCT 








ACC I 










Xca I 




r 




n 


f 


k 




65 


66 


67 




69 




CCT 


AAC 


/vAC 




AAy^ 





368 



205 



71 
GCC 



72 
CAA 



d 
73 
CAT 



c 
74 
TCC 



m 
75 
ATG 



r 

76 
CCT 



I Sph T| 



t I C I q I 
77| 78| 79| 
ACcj TCC I CCT I 





a 




e 


q 


d 


d 


80 


81 


82 


83 


84 


85 


86 


CCC 


CCC 


CCT 


CAA 


CCT 


CAT 


CAT 


Pbe I 


























P 






a 


a 






87 


88 


39 


90 


91 






CCC 


CCCi AAA CCG 


GCC 






I 


Sf ; 


r 









346 



361 



f 

92 
TTT 



1 j q 
95 96 
CTC I CAA 



a 


s 


a 


t 


97 


93 


99 


100 


CCT 


TCT 


CCT 


ACC 
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Table 25, continued. 





i 


V 


<? 


a 


c 


,1 


g 


i 


113 


114 


115 


116 


117 


113 


119 


120 


ATC 


CTT 


Gcr 


CCT 


ACC 


ATC 


CCT 


ATC 



44B 



k 

121 
AAA 



s 
131 
TCT 



1 

122 
CTC 



132 
TAA 



f 

123 
TTT 



133 
TAC 



X 
/vAC 



13^ 

Tc; 



125| 125 
AAA 1 TTT 



t I s I k I a I 
127 128 120| 130l 
ACt|tCg! AAAiCCcI 
' Asu I r ! 



CCT ! TAC I CAC I TCT I 
BstE ril 



I AAC| CCC| CCC I TAA ! TCA I CCC I CGC| TTT| TTTI TTT 
I Trp terninatcr 



I CCT I CAC I G -3' 
I Sail I I 



Note the following enzyne equivalences. 



502 



532 



539 



X na IX r 

Acc HI 

Pra II 

ASU II 

Savi I 



= £013 I 
~ BspM II 
= ECOO109 
^ Bscs I 
= BSU36 I 




- / 



/ 



.ON 
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Table 27: Dt/A_synthl 
5' I CCC I TCC I CTC t CCA ! CCC I TAT ! CCA | CCC I TTT t ACA I CTT I TAT { 
I CCT I TCC I CCC I TCG I TAT I AAT I CTC I TGC ) 



! AAT I TCT 1 GAG I CCC I AT A ' ACA ) ATT ! 

olig?-* = 3'- taa 



ICCTl AGC[ 
gga tec 



/ 3' = oliqn 
I GCC I OCT I CCT [ TCG ' AAA | CCG | 
c^g cga gga age tt t cgc 



I TCT I TAA i TAG [ TC A | GCT \ T AC | CAC 1 TCT | 
aga af.t ate act cca atg gtc aga 



I AAG I CCC [ CCC I TAA [ TGA ] GCC | CGC | TTT | TTT j TTT J 
ttc ggg egg att act cgc ccg c-».aa aaa aaa 



I CCT I GAG I GCA | GCT [ GAG | CO 
gga etc cgt cca etc yc - 5' 



"Top" strand 09 

"Bottom" strand 100 

Overlap 23 (iA c/g and 9 a/ t) 

Net length 158 




Tit/: 
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Tabic 23: Dr(A_ccq2 



5'- [gca[ccalacg 





CCT I ACC I ACG I CTC | ACT | 
Ave U \ 

I ?. Pr I 



I 

ATC 



k 


k 




V 


1 1 k 


a 


s 


2 






6 


7 8 


9 


10 


AAC. 


AAA 


TCT| CTG 


GTT 


CTt| AAC 


CCT 


ACC 










Afi rr 


tiho r 



I V a 
t 11 12 
! -TIT CCT 



vat 
13 14 15 

ctc|ccg|acc 
! Mm XI 



1 

16 
CTC 



V [ p n 
17 18 19 
CTA CCG ATC 



1 

20 
CTG 




1 ^ 


1 ^ 


a 


r P 


d 


t 


c 


2 1 


22 


23 


24 25 


26 


27 


28 


1 TCT 1 TTT 


CCT 


CCT CCG 


CAT 


TTC 


TGT 








lAccITII 






P 


P 


y 


t 1 g 


P 


c 


k 


31 


32 


33 


3 4 35 


36 


37 


33 


CCG 


CCA 


TAT 


ACT CCG 


CCC 


TCC 


AA.\ 




PflM r ( 









1 

29 
CTC 
Ava 



e- 
30 
GAG 
I 



Apfl T I 



a 
39 
CCC 

n 



CCC I 
H lit 



Dra II 



Pss I 



i 


r 


42 


43 


ate 


cqt 



t I 5 I k I 

127)i:8ll29 

ACT I TCC I AAa | gcg | get | gcg ( 
JAsu Til r.pflc or i 
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Table 29: DNA_synth2 
5'- [GCAlCCAl ACCI 
\ CCT \ XC,r.\ AGa\CTC\ ACT \ 
[ ATC i AAC I AAA I TCT I CTG I C7T I CTT I AAC 1 GCT I AGC ! 



I CTT| GCT I GTC I CtC I ACC 1 CTG I GTA f CCC I ATG ' CTG I 
olignS = 3'- ggc tac gac 



/ 3 ' = olig* 5 
I TCT I T T T I GCT t CGT i CCG I GAT 1 TTC | TGT | CTC | GAG | 
aga aaa ego gca ggc eta aag aca gag etc 



I CCG 1 CCA 1 TAT \ ACT [GCG | CCC | TCC | A/^^ | GCC [ CGC | 
ggc ggt ata tga ccc gao acg t^t cgc gcg 



I ATC I ATC I CGT I 
tag tag gca 



I ACT I TCC I \ CCC I GCT | GCG | 
■tga age ttt cgc cga cgc - 5' 



"Top" strand 
."Bottom" strand 
Overlap 
Net- length 



99 
99 

24 (lA c/g and 10 a/t) 
155 



- . .5 



SI: 
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Table 30: 0:iA_seq3 



















a 


r 




















39 


40 












5'- 


1 ccc 


lege 


1 aca 


GCG 


CCC 














1 sD.^cer 


PssH ri 




1 


i 


,r 


. y 


l.'s 


1 y 


1 " 


1 ^ 


k 






41 


42 


43 


44 




1 46 


1 47 


! ^8 


49 






ATC 


ATC 


CCT 


■tAT|TTC|TAC| A/\C 


CCT 


AAA 






a 


g 


1 


c 


q 


t 


f 


V 


y 




q 


SO 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


cca| 


GGC 


CTC 


TGC 


CAC 


ACC 




GTA 


TAC 


GOT 


CCT 


I 














ACC 1; 




















Xca r 







c 
61 
TGC 



s 

70 
TCG 



r 

62 
CGT 



a 

63 
GCT 



64 
AAG 



ESP I 



r 

65 
CCT 
1 



n n 
661 67 
AAC| AAC 



f 

68 
TTT 



a 

71 
CCC 



e 

72 
GAA 



d 
73 
GAT 



c 

74 
TGC 



75 
ATC 



76 77 
CCT ACC 



I sph t;! 



69 
AAA 



C I q 
78 79 
TGC I GCT 



80 
GGC 

■ Bbe 



a 

81 
GGC 
I 



gct|gaa| 
spacer 



ttt 



t 


s 


K 


127 


128 


129 


acT 


TCG 


AAa 


lAsu ni 



gcg|tcg|ccg| - 3' 



k3 

h-..--'| 

m 

H 



m 

m 




Table 31: Dt/A_synth3 



5 ' - i CCC I TCC I ACA I CCG I CCC I 
1 ATC I ATC I CGT 1 TAT ' TTC I TAC ! A AC I CCT I AAA [ 



I CCA I CCC I CTG I TGC ' CAC I ACC I TTT ! GTA I TAC I GOT ! GGT f 
oiig^a = 3'- g cca cca 

/ 3' = olig^7 
I TGC [ CCT I CCT I -VVG I CCT I AAC i AAC \ TTT I AAA | 
acg gca cga ttc gca teg ttg aaa ttt 



I TCC I CCC I CAA I CAT I.TCC | ATC | CCT | ACC | TGC ( CCT | 
age egg ctt eta acg tac gca tgg acg cca 

I CCC I CCC I CCT I CAA I 
ccg egg cgt ctt 



I TTT I ACT I TCC | AAA i CCC j TCC I CCC j' 
aaa.tga age ttt egc age ggc -5' 



"Top" strand 93 

"BottOQ" strand 97 

Overlap 25 (15 g/c & 10 a/t) 

Net length 1^6 



o 



Table 32: DMA_seq-; 









g 


a 


a 


e 


g 


d 


1 d 


5' 






80 


81 


82 


33 


64 


85 


1 86 


Icctlcgc 


Icct 


GGC 


GCC 


GCT 


GAA 


GOT 


CAT 


|CAT 


I soacer 


Bbe T 


















Mar I 














P 


a 






a 












87 


88 


85 


90 


SI 












CCC 


GCC 


AAA 1 GCC 


GCC 












! 


Sfi r 
















f 


n 


s 


1 


q 




s 


a 


t 




92 


93 


94 


95 


96 


97 


98 


99 


100 




TTT 


AAC 


TCT 


CTC 


CAA 


GCT 


TCT 


GCT 


ACC 



|Hind 31 



e 
101 
GAA 



y 

102 
TAT 



1 

103 
ATC 



g I y 



104 105 



'1' 

GGT 1 TAC 



106 
GCG 



I XI u II 



107 
TGC 




M 



Table 33: D::A_synth4 
5' I GCT 1 CGC I CCT I CCC I CCC f GCT I C.\A t OCT I CAT } 0 AT ' 
|CCC(CCC|AA A[CCG|CCCi 

i TTT I AAC I TCT I CTC I CA A I CCT ! TCT ' CCT t ACC I 



IGAAITAT! AT ClCCTlTAC'GCClTCG I 
oligglO = 3'- ata tag cca atg cgc acc 



/ 3' = oligjO 
I GCC I ATG I C TC | CTC | CTT [ 
egg tac cac cac caa 



I ATC I CTT [ CCT [ CCT ] ACC ] ATC | CCT | ATC } 
tag caa cca cga tgg tag cca tag 



I AAA I CTC I TTT | AAC | AAA | TTT | ACT | TCC | AAA | CCG | TCT j TCA | 
ttt gac aaa ttc ttt aaa tga age ttt cgc aga act - 5' 



"Top" strand 100 
"Bottom" strand 93 

Overlap 25 (14 e/g and 11 a/t) 

Net length 149 



m 

m. 



m 




•Tf ■• 




Res. 



-3 
-4 
-3 
-2 
-1 
1 
2 
3 
4 
5 
6 
7 
3 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 ' 
.30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
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Table 34: Sorae interaction sets in BPTI 



Numbei 
Oiff . 
AAs 



Contents 



BPTI 1 2 3 4 5 



2 
2 
5 

10 

10 

10 
9 

10 

7 
1 
10 

5 

7 

9 
10 
10 

2 

5 

3. 
12 

7 
12 

6 

7 

5 

4 

6 

2 

4 
10 

9 

5 

7 
10 
- 1 

7 
11 

1 
11 

2 

3 ' 
1 
3 
7 



D -32 
E -32 

T P F Z -29 

Z3 R3 Q2 T2 H C L K E -18 
D4 T2 P2 Q2 E C tl K R -18 
R21 A2 K2 H2 P L r T C D 
P20 R4 a: 112 N £ V F L 
015 K6 T3 R2 P2 S Y C A L 
F19 04 L3 Y2 12 A2 S 
C3 3 

Lll E5 N4 
L18 Ell K2 



K3 Q2 12 Y2 D2 T 
S Q 



P17 A6 V3 R2 Q L K Y F 
Yll £7 D4 A2 N2 R2 V2 S I D 
T17 P5 A3 R2 I S Q Y V K 
G32 K 

P2 2 R6 L3 U I 
C31 T A 

K15 R4 Y2 M2 L2 -2 V G A I K F 
A22 G5 Q2 R K D F 
R12 K5 A2 Y3 H2 S2 F2 
121 M4 F3 L2 V2 T 
111 PIO R6 S2 K2 L Q 
R19 A7 S4 L2 Q 
Y18 F13 W I 
F14 YI4 H2 A H S 
F 

S 

P3 W3 L2 T2 



L M T G P 



Y32 

N2C K3 D3 

A12 S5 Q3 P3 W3 L2 T2 K G 
K16 A6 T2 E2 S2 R2 G H V 
A18 S8 K3 L2 T2 
C13 KIO N5 Q2 R U H 
L9 Q7 K7 A2 F2 R2 M C T N 
C33 

QI2 Ell L4 K2 V2 Y tJ 
T12 P5 K4 Q3 E2 L2 G 



V S R A 



F3 3 

Vll 18 T3 D2 N2 Q2 F H P R K 
Y31 W2 
C27 S5 R 
G33 

C31 T A 

R13 G9 K4 Q3 02 P M 



R 
P 
0 
F 
C 
L 
E 
p 

P 

Y 

C 
P 
C 
K 
A 
R 
I 
I 
R 
Y 
F 
Y 
N 
A 
K 
A 
C 
L 
C 
Q 
T 
F 
V 
Y 
G 
C 

c 

R 



X 

s 5 

4 3 




c 
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Table 36, continued. 
Distances in Angstroms between C^etaS* 
Hypothetical Cbeta added to each Glycine. 





E49 


M52 


P9 


Til 


K15 


A16 


118 


R20 


F2 2 


II24 




6i 


1 
































P9 


17 . 


7 


15.5 






























Til 


22. 


1 


21.5 


7.2 




























K15 


27. 


5 


28.7 


16.4 


9. 


5 
























A16 


22. 


2 


24.2 


14.9 


9. 


8 


6. 


2 




















118 


17. 


4 


19.5 


12.2 


9. 


5 


10. 


4 


4 . 


9 
















R20 


13. 


0 


13.8 


8.0 


9. 


4 


14 . 


9 


10. 


6 


6. 


2 












722 


13. 


e 


11.4 


4 . 1 


10. 


6 


19. 


1 


16. 




12. 


7 


O . 7 










N24 


15. 


6 


11.2 


8.4 


15. 


3 


24 . 


I 


2 1 . 


y 


i "3 . 


2 


12 . / 


6 . 


6 






K2 6 


20. 


9 


15.7 


12. 1 


18. 


6 


27. 


9 


26. 


6 


23. 


3 


13 . 1 


1 1 . 


6 


S . 


9 


C30 


8 . 


7 


5.6 


10. 6 


16. 


6 


24 . 


1 


20. 


2 


15. 


7 


9.8 


6 . 


8 


6 . 


9 


F33 


16. 


5 


15.4 


4 . 2 


7 . 


1 


15. 


0 


12. 


3 


9. 


6 


6.1 


5. 


6 


9 . 


3 


Y35 


17. 


2 


17.3 


7.8 


5. 


8 


11. 


0 


7 . 


6 


4 . 


9 


4 . 3 


3. 


8 


14 . 


8 


S47 


4 . 


7 


9.1 


15.3 


18. 


5 


23. 


1 


17. 


6 


12^ 


8 


9.1 


12. 


0 


15. 


3 


D50 


5. 


5 


7.7 


14.7 


18. 


6 


24 . 


2 


19. 


2 


14 . 


7 


9.9 


11. 


0 


14 . 


7 


C51 


7. 


1 


5.4 


11.0 


16. 


4 


23. 


5 


19. 


2 


14. 


6 


8.7 


6. 


9 


9. 


6 


K53 


6. 


3 


5.6 


17.9 


23. 


1 


29. 


6 


24 . 


8 


20. 


3 


15.0 


13. 


8 


iS. 


5 


.R3 9 


23 . 


9 


24.0 


13.0 


9. 


5 


12 . 


0 


11. 


3 


12. 


5 


12.8 


14 . 


7 


20. 






K26 


C30 


F3 3 


V35 


S47 


D50 


C51 


R53 










C30 


12. 


4 
































F33 


13. 


9 


10.1 






























V35- 


19. 


5 


13.5 


6.4 




























S47 


21. 


0 


8.8 


13.5 


13. 


2 
























D50 


20. 


1 


8.6 


14.3 


13. 


7. 


5. 


0 




















C51 


15, 


0 


3.7 


10.9 


12. 


5 


6. 


9 


5. 


2 
















R53 


19. 


9 


9.9 


18.2 


18. 


8 


9. 


4 


5. 


3 


7. 


4 












R39 


24 . 


3 


20.6 


14.4 


9. 


6 


20. 


4 


19. 


0 


13, 


8 


23.4 
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Taole 


37: 


vgDCA t 


o vary DPTI 


set 


»2.1 










g 


1 P 


c 


1 ^ 


1 ^ 1 X 


1 












35 


36 


37 


1 " 


39I 40 


! 






5'- 


ICAC 


! CCT 


GCC 


1 CCC 


TCC 


! AAA 


\ccz\nty. 




20s 




1 spacer • 


\ Apa r 1 










1 i 




1 ^ 


- y 


1 ^ 


1 y 


1 n 


1 a 


1 k 1 






41 




43 




■ 4 5 


1 46 


47| ^3 


49! 






1 ATC 




|CCT 


TAT 




• TAC 


• AAC 




AAA; 
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/ : 


' = olig 




7 2 nts 


+ 


1 


+ 










+■ 










g 


K 


c 


q 




i 


V 


y ! g 


g 




50 


51 


52 


53 


54 






57 


53 1 59 


6 0 






GGt 


qfk 


TCC 


CAG 


ACC 


TTC 


qfk 


TAC t CCT 


GGT 


268 


oligi 28=" 


3 


acg 


gtc 


tgg 


aag 


* -ra 


atg cca 


cca 




78 nts 




















Overlap = 


= 12 


(7 CC, 5 AT) 












c 


r 


a 


k 


r 


n 


n 


r i 


1 






61 


62 


63 


64 


65 


66 


67 




.Oil • 






TCC 


CCT 


CCT 


AAC 


CCT 


AAC 


AAC 
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acg 


gca 


cga 


ttc 


gca 


ttg 


ttg 


aaa 


c 1 1 








I 




1 














s 


X 1 


e 


d 


c 


m 1 












70 


Til 


72 


/J 


74 














TCT 


qfJc( 


GAG 


CAT 


TCC 


ATC| 










322 


age 


* *ni 


etc 


eta 


acg 


tac 


qca 


CCC 


acc -5' 














1 Soh T 


1 sn^cor ! 







k = equal parts of T- and G; .1 = eq^jal parrs 
q ^ (.26 T, .13 C, .26 A, and .30 C) ; 
f ^ (.^2 T, .16 C. .40 A, and .22 C); 
* = complement of symbol above 



of C and A; 



Residue. 40 42 

Possibilities 21 x 21 
Abundance x 10: 

of PP3D ,768 ,271 .459 -671 .600 .459 



50 52 57 
X21x21x21x 



1 = 8 . 6 X 10' 



Produce 



1.77 X 10" 



Parent = 1/(5.5 x lO') loait favored = 1/(4,2 x 10^) 
Least favored one-amino-ac id substitution from PPBD present 
at 1 in 1,6 :< lo"' 



I. '. 
r. 



m 




C 




C, 
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Table 41: vg DNA setif2 of BPTI 2.3 



5'- eg aoc ctQ 



1 

29 
CTC 



e 

30 
GAG 



Xho 





+ 




+ 












+ 


p 


X 


y 




g 


P 


C 


E 


a 


X 


31 


32 


33 


34 


35 


36 


37 


38 


39 


40 


CCC 


vna 


TAT 




GGG 


CCC 


TGC 


GAG 


GCC 




V 


Q 


N 


+ 

X 


f 


y 


n 


a 


Jc 




4 1 


42 


43 


44 


45 


46 


47 


43 


49 




GTT 


GAG 


AAT 


Tdk 


TTC 


TAG 


AAC 


CCT 


AAq 


-3' 



178 



208 



oliq333 71 nts 



67 nts oligS34 3'- g atg ttg egg tec 
Overlap = 13 (7 CG, 6 AT) 



+ 




+ 






+ 




+ 






X 


F 


X 


c 


S 


X 


f 


X I y 


g 


g 


50 


51 


52 


53 


54 


55 


56 


57 1 53 


59 


60 


vAG 


TTT 


nTJc 


TGC 


TCT 


qfk 


■rrr 


qfJclTAC 


GGT 


GGT 


btc 


aaa 


na»-n 


a eg 


aga 


**m 


aaa 


**n atg 


eca 


eca 


c 


r 


a 


k 














61 


62 


63 


64 














TGC 


CGT 


GCT 




C 
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acg gea ega ttc gcg acc ggc 
I Esq T I soacQr I 



k = equal parts of T and G; n 
w = equal parts of A and T; n 
d = equal parts A,G,T; 
q = ( .2^ T. . 18 C, . 26 A 
f. = (.22 T, .16 C, .40 A, and .22 G) ; 
* = complement of syiTibol above 



= equal parts of C and A; 
equal parts of A,C,G,T; 



V = equal parts A,C,G; 
and .30 C) ; 



Residue 
Possibil it ies 



32 



34 



40 



44 



50 



52 



55 



57 



6x 6x21x 6x 3x 5x21x21 



Abundance x 10 , 

of PPBO 10/6 10/6 

product = 1.01 X 10"^ 



parent = 1/(1 x 10') 



3 X 10' 



.545 10/6 10/3 30/8 .459 .701 



least favored 1/(4 x 10^) 



Least favored one-aminc-acid substitution fron PFBD present 
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A method of obtaining a protein that binds a 
predetermined target that comprises; 

a) preparing a variegated population of replicable 
genetic packages, each package including. a nucleic 
acid construct coding on expression for an outer- 
surface-displayed potential binding protein 
comprising (i) a structural signal directing the 
display of the protein on the outer surface of the 
package and (ii) a potential binding domain for 
binding said target^ where a plurality of 
different potential binding domains are displayed 
by said population, 



25 



b) causing the expression of said proteins and the 
display of said proteins on the outer surface of 
such packages, 

c) contacting the packages vith target material so 
that the potential binding domains of the proteins 
and the target material may interact, and 
separating packages bearing a binding domain that 
binds target material from packages t-^at do not so 
bind, and 

d, recovering and replicating at lease one package 
bearing a successful binding domain. 

The method of claim 1 wherein the population of 
replicable genetic packages of step (a) is 
obtained by; 



) preparing a variegated population of ON 
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A method of obtaining a protein that binds a 
predetermined target that comprises: 

a) preparing a variegated population of replicable 
genetic pack.iges, each package including. a nucleic 
acid construct coding on expression for an outer- 
surf ace -d i sp 1 ayed potential binding protein 
comprising (i) a structural signal directing the 
display of the protein on the outer surface of the 
package and (ii) a potential binding domain for 
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different potential binding domains are displayed 
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display of said proteins on the outer surface of 
such packages, 

c) contacting the packages vith target material so 
that the potential binding domains of the proteins 
and the target material say interact, and 
separating packages bearing a bindir.g domain that 
binds target material fro3 packages t>.dt do not so 
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d, recovering and replicating at least one package 
bearing a successful binding domain. 

The method of claim 1 wherein the population of 
replicable genetic packages of step (a) is 
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certain predetennined degree of affinity for 
target, material, and the required degree of 
affinity is increased for each new variegated 
population. 

The method of claim 1 wherein the displayable 
potential binding protein is a chimeric protein. 

The method of claim 7 wherein said signal is 
provided by a segment of said chimeric protein 
which is essentially identical in amino acid 
sequence with at least a functional portion of a 
natural outer surface protein encoded by said 
genetic package or a cell naturally infected by 
said genetic package, said portion directing the 
transport of said chimeric protein to the outer 
surface of the genetic package. 

The method cf claim 2 wherein the second sequence 
is obtained by operably linking a DMA sequence 
encoding a potential outer surface transport 
signal to a DIIA sequence expressing a protein tha^ 
confers a selectable phenotype to obtain a test 
construct, introducing tho test constructs into 
suitable hosts, causing expression of said DMA 
construct, selectiny genetic packages that display 
the protein that confers the selectable phenotype 
on their outer surface, and choosing as said 
second sequence the ONA sequence encoding the 
potential outer surface transport signal of one of 
such selected genetic packages; wherein the 
potential outer surface transport signals encoded 
by the individual test constructs are non- 
identical. 
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certain predetermined degree of affinity for 
target, naterial, and the required degree of 
affinity is increased for each new variegated 
population. 

The method of clain 1 wherein the displayable 
potential binding protein is a chimeric protein. 

The method of claim 7 wherein said signal is 
provided by a segment of said chimeric protein 
which is essentially identical in amino acid 
sequence with at least a functional portion of a 
natural outer surface protein encoded by said 
genetic package or a cell naturally infected by 
said genetic package, said portion directing the 
transport of said chimeric protein to the outer 
surface of the genetic package. 

The method of claim 2 wherein the second sequence 
is obtained by operably linking a DNA sequence 
encoding a potential outer surface transport 
signal to a DMA sequence expressing a protein that 
confers a selectable phenotype to obtain a test 
construct, introducing tho test constructs into 
suitable hosts, causing expression of said ONA 
construct, selecting genetic packages that display 
the protein that confers the selectable phenotype 
on their outer surface, and choosing as said 
second sequence the DMA sequence encoding the 
potential outer surface transport signal of one of 
such selected genetic packages; wherein the 
potential outer surface transport signals encoded 
by the individual test constructs are non- 
identical. 
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The method of clain 3 in which tha binding domain 
of the known protein has a known sequence of amino 
acids, and the identity and spatial relationship 
of the anino acids fonaing a surface of said 
domain is known - 
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The method of claim 3, said target material 
comprising one or more discrete molecules, said 
parental potential binding domain being 
characterized as a sequence of amino acids, 
further comprising identifying an interaction set 
.of amino acids which are on the surface of the 
parental potential binding domain and which can 
all simultaneously touch a single molecule of the 
target "material, and obtaining potential binding 
domains by substituting a different amino acid for 
one or more of the amino acids in said interaction 
set. 

The method of claim 3 wherein the level of 
variegation of the population is chosen such that 
the packages displaying potential binding domains 
obtained by single amino acid substitutions in the 
amino acid sequence of the parental potential 
binding dc-.ain are present in detectable amounts. 

The method of clain 3 wherein the amino acid 
substitutions to be made are chosen after 
consideration of the 3D structure of the parental 
potential binding domain. 

The method of claim 15 wherein the amino acid 
substitutions to be made are for amino acids of 
the chosen domain of the known protein which are 
known to be alterable without reducing the melting 
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17. The method of claio 3 in which the binding doaain 
of the kno-^ protein has a known sequence of amino 
acids, and the identity and spatial relationship 
of the anino acids foming a surface of said 
domain is known. 

18. The nethod of claim 3, said target material 
comprising one or more discrete molecules, said 
parental potential binding domain being 
characterized as a sequence of amino acids, 
further comprising identifying an interaction set 

.of amino acids wnich are on the surface of the 
parental potential binding domain and which can 
all simultaneously touch a single molecule of the 
target material, and obtaining potential binding 
domains by substituting a different amino acid for 
one or more of the amino acids in said interaction 
set. 



20 19. The method of claim 3 wherein the level of 
variegation of the population is chosen such that 
the packages displaying potential binding domains 
obtained by single amino acid substitutions in the 
amino acid sequence of the parental potential 

25 , binding dcr.ain are present in detectable amounts. 



20. The method of clairs 3 wherein the anino acid 
substitutions to be made are chosen after 
consideration of zhe 3D structure of the parental 

30 potential binding domain. 

21. The method of claim 15 wherein the amino acid 
substitutions to be made are for amino acids of 
the chosen domain of the known protein which are 
known to be alterable without reducing the melting 
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affinity separated and retain viability. 

30. The method of claim 3 in which the initially 
chosen parental potential binding protein has at 

5 least one stable binding domain and said' donaln 

has a melting point of at least 60°C and is stable 
over a pH range of at least 3.0-B.O. 

31. The method of claim 15 uherein the knovn binding 
10 protein is an enzyme, the activity of which has a 

deleterious effect on the replicable genetic 
package, the host of the replicable genetic 
package, or the target, wherein the majority of 
the nucleic acid constructs code on expression or 
15 an analogue of the known binding protein that does 

not have such enzymatic activity. 

32. The method of claim 1 wherein the target contains 
ionizable groups and the pH of the solutions of 

20 the intended use and the pH of the afCinity 

separations are chosen so that both the potential 
binding protein and the target remain stable. 



33. The method of claim 1 wherein the target contains 
25 - ionizable groups, further comprising providing 

counter ions in, affinity separations and the 
solutions of the intended use to reduce 
electrostatic repulsion between the potential 
binding protein and the target, 

30 

34. The method of claim 1 wherein the initial 
potential binding domain is picked so that, under 
the conditions of intended use of the desired 
binding protein and under the conditions of 

35 affinity separation, that the potential binding 
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30. The method of claim 3 in which the initially 
chosen parental potential binding protein has at 
5 least one stable binding domain and said' domain 

has a melting point of at least 60°C and is stable 
over a pM range of at least 3.0-8.0. 
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The method of claim 15 wherein the known binding 
protein is an enzyme,- the activity of which has a 
deleterious effect on the replicable genetic 
package, the host of the replicable genetic 
package, or the target, wherein the majority of 
the nucleic acid constructs code on expression or 
an analogue of the known binding protein that does 
hot have such enzymatic activity. 



32. The method of claim 1 wherein the target contains 
ionizable groups and the pH of the solutions of 
the intended use and the pH of the affinity 
separations are chosen so that both the potential 
binding protein and the target remain stable. 
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33. The method of claim 1 wherein the target contains 
25 - ionizable groups, further comprising providing 

counter ions in affinity separations and the 
solutions of the intended use to reduce 
electrostatic repulsion between the potential 
binding protein and the target. 



34. The method of claim 1 wherein the initial 
potential binding domain is picked so that, under 
the conditions of intended use of the desired 
binding protein and under the conditions of 
35 affinity separation, that the potential binding 
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thereof embodying en outer surface transport 
signal; 



45. The method of claim 42 wherein the signal is 
5 provided by the gene III protein, of MI3 or a 

segment thereof embodying an outer surface 
transport signal. 

46. The method of claim 3 wherein the initially chosen 
10 parental potential binding dozLain is at least 501 

homologous with the binding docain of bovine 
pancreatic tr^'psin inhibitor, having the residues 
C5, C12, C30, F33, C37, C51 and C5S. 



15 47. The method of claim 46 further specifying that: a) 
residue 21 contains one of the amino acids Y, 
W, or I; b) residue 23 contains one of the amino 
acids Y or F; c) residue 35 contains, one of the 
residues Y, F, or W; d) resi-iue 40 contains one of 

20 tt:e amino acids G or A; ej residue 45 contains 

either F or Y. 
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48. The method of claim 47 wherein the residues to be 
varied are chosen fron among residues 17, 19, 21, 
27, 28, 29, 31, 32, 34, 48, 49, and 52, 

49. The muthod of claim 43 vr.erein the additional 
residues 9, 11, 15, 16, 13, 20, 22, 24, 26, 35, 
47, anti 53 are allowed to vary, 

50. The method of claim 47 wherein the residues to 
vary are picked from one of the interaction sets 
identified in table 34. 



35 



51. The method of claim 2 wherein the distribution of 
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thereof cabodying an outer surface transport 
signal: 

The method of claim 42 wherein the signal is 
provided, by the gene III protein . of Hl.3 or a 
segment thereof embodying an outer surface 
transport signal. 



46. The method of claim 3 wherein the initially chosen 
10 parental potential binding domain is at least 50% 

homologous with the binding domain of bovine 
pancreatic tr^'psin inhibitor, having the residues 
C5, C12, C30, F33. C37, C51 and C55. 



15 



20 



The method of claim 46 further specifying that: a) 
residue 21 contains cr.e of the amino acids Y, F, 
w, or I; b) residue 23 contains one of the amino 
acids Y or F; c) residue 35 contains, one of the 
residues Y, or W; d) resiJue 40 contains one of 
tl:e amino acids C or A; e) residue 45 contains 
either F or Y. 
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48. The method of claim 47 wherein the residues to be 
varied are chosen from among residues 17, 19, 21, 
27, 28, 29, 31, 32, 34, 48, 49, and 52. 

49. The miithod of clain 4 3 vr.erein the additional 
residues 9, 11, 15, 16, 13, 20, 22, 24 , 26, 35, 
47, antl 53 are allo-ed to vary. 

50. The method of claim 47 wherein the residues to 
vary are picked from one of the interaction sets 
identified in table 34. 



35 51. The method of claim 2 wherein the distribution of 
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Insensitive Co UV, tolerant of desiccation, and 
resistant to a pH of 2.0 to 10,0. 

58. The method of claim 1 wherein the genetic packages 
may be frozen and later revived, 

59. The metl.od of claim 1 wherein the genetic package 
is a cell with a doubling time of 20-40 minutes. 

60. The method of claim 1 wherein the genetic package 
is . a virus vith a burst size of at least 
100/infectcd cell. , 

61. The method of claim 1 wherein the genetic packages 
are harvested by centr i f ugation without loss of 
viability. 

62. The method of claim 3 wherein the initially chosen 
. parental potential binding domain is selected frnm 

the group consisting of (a) binding domains of 
bovine poncreatic trypsin inhibitor, crambin, 
ovomucoid, T4 lysozyme, hen egg white lysozyme, 
ribonucleose, and azurin, and (b) domains at least 
50\ homologous with any of the foregoing domains 
and which have a melting point of at least 60°C. 

63. The method of claim 36 wherein the outer surface 
transport signal is provided by the lamS protein 
or a segment thereof embodying an outer surface 
transport signal, 

64. The method of claim 38 wherein the outer surface 
transport signal is provided by the cotA, cotB, 
cote or coto protein or a segment thereof 
embodying an outer surface transport signal. 
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Insensitive to UV, tolerant of desiccation, and 
resistant to a pH of 2.0 to 10.0. 

58. The loethod of claim 1 wherein the genetic packages 
5 ' Day be frozen and later revived. 

59. The metl.od of claim 1 wherein the genetic package 
is a cell with a doubling time of 20-40 oinutes. 

.10 60. The method of claim 1 wherein the genetic package 
is a virus with a burst size of at least 
IGC/infcctcd cell. 

61. The method of claim 1 wherein the genetic packages 
15 are harvested by ccntri f ugation without loss of 

viability. 

62- The method of claim 3 wherein the initially chosen 
parental potential binding domain is selected f rr:a 

20 the group consisting of (a) binding domains of 

bovine pancreatic trypsin inhibitor, cracbin, 
ovomucoid, T4 lysozyme, hen egg white lysozyne, 
ribonuclease , and azurin, and (b) domains at least 
50\ homologous with any of the foregoing dorr.ains 

25 and which have a melting point of at least 60°C. 

63. The method of claim 36 wherein the outer surface 

transport signal is provided by the lam3 protein 

or a segment thereof enbodying an outer surface 
5 transport signal. 




64. The method of claim 38 wherein the outer surface 
transport signal is provided by the cotA, cotB, 
cote or cotD protein or a segment thereof 
embodying an outer surface transport signal. 
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is further chosen to yield the largest value for 
the quantity ({ 1 .-abundance (stop codons)) tines 
(abundance of the least abundant amino 
■ acid) / (abundance of the most abundant amino 
acid) ) . 

The protein of clain 66, wherein the protein 
comprises a first foreign domain recognizing a 
first target material and a second foreign domain 
recognizing a second target material.. 
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is further chosen to yield the largest value for 
the quantity (( 1 .-abundance (stop codons}) times 
(abundance of the least abundant aoino 
' acid )/ (abundance of the most abundant amino 
acid) } . 

72. The protein of claim, 66, wherein the protein 
comprises a first foreign domain recognizing a 
first target material and a second foreign domain 
recognizing a second ta rge t mate rial . 
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Choose Genetic r3cy:jqc. 
Outer r.ui'taco Protein, 
OCV, and I PUD 



::f.t r;<:D-IPbD 



r^rcsc Kcciciucs to \'^ry ^ 



Synthesize vcu:.*A Clone Into CCV. L 
Introduce into wtc? to Obtain CPf nbd ) 



Cause CPs to Ejcpress nnd Display PDOo 



Use Arfinity Sc-poration to Isolate 
CP;3ri'')s fron Other GP(PiiD;G 



£r.i .\chr.Gnt 
Cvcle 



Vgs 



Fcrtr;er r ic^r.iir.t Ncodod? 
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Recover and ;.-plify C?(SI:0}s 



CharrtctcrizG Isolated PHDs 



Ves oipJinq Cooi Enough? 



Set PPE;r = SBD 
or, i f . no SBD, 
Set PPuD'IPDD 
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or, if.no SUD, 
Set PPuDalPDO 



Done 
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